Extracting complicated text

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.
Post Reply
Eilip999
Posts: 3
Joined: Fri Nov 30, 2018 4:44 pm

Extracting complicated text

Post by Eilip999 » Fri Nov 30, 2018 5:51 pm

Imacros browser V8.3 trial
Windows 10
Hey I am trying to make simple script to solve internet test for me automatically.
I made script that works well with simple text but simetimes text is confusing. Text on both sites look the same.
There is my script

Code: Select all

VERSION BUILD=8032216
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=http://www.odpowiedzi.c0.pl/algorytmika_programowanie/jak_wnioskuja_maszyny.html
TAB OPEN
TAG POS=1 TYPE=H4 ATTR=* EXTRACT=TXT
SET VAR1 {{!extract}}
SET !EXTRACT NULL
TAG POS=188 TYPE=P ATTR=* EXTRACT=TXT
SET VAR2 {{!extract}}
TAB T=2
URL GOTO=https://it-szkola.edu.pl/kkurs,kurs,59,test
TAG POS=1 TYPE=TH ATTR=TXT:*{{VAR1}}*
TAG POS=R1 TYPE=LABEL FORM=ID:testForm ATTR=TXT:{{VAR2}}
It copies question and correct answer then open site with test search for question then choose correct answer.

Site with answers code:

Code: Select all

        <h4>Question?</h4>
        <p>answer</p>
        <br>

Site with test:

Code: Select all

<div class="testPyt  ">
                                <table>
                                    <tr>
                                        <th colspan="2">
                                            <span class="testPytLiczFull"><span class="testPytLicz">3</span>. </span>
                                             
                                            Question?<br />                                     </th>
                                    </tr>
                                                                            <tr>
                                            <td class="testCheck"><input type="checkbox" name="idodp[830845]" id="idodp_828949_830845" value="1" /></td>
                                            <td class="testQuest"><label for="idodp_828949_830845">ANSWER1</label></td>
                                        </tr>
                                                                                <tr>
                                            <td class="testCheck"><input type="checkbox" name="idodp[830846]" id="idodp_828949_830846" value="1" /></td>
                                            <td class="testQuest"><label for="idodp_828949_830846">ANSWER2</label></td>
                                        </tr>
                                                                                <tr>
                                            <td class="testCheck"><input type="checkbox" name="idodp[830844]" id="idodp_828949_830844" value="1" /></td>
                                            <td class="testQuest"><label for="idodp_828949_830844">ANSWER3</label></td>
                                        </tr>
                                        
                                </table>

                                <br /><br />
                            </div>
When the question is one sentence it works perfectly, but sometimes it looks like this.

Site with answers:

Code: Select all

<h4>Question about script below:<br>

<script type="text/javascript"><br>
  z = 1;<br>
  function a()<br>
  {<br>
    var z = 2;<br>
  }<br>
  a();<br>
  alert(z)<br>
</script></h4>
        <p>Answer</p>
Test site code:

Code: Select all

<th colspan="2">
                                            <span class="testPytLiczFull"><span class="testPytLicz">10</span>. </span>
                                             
                                            Co będzie wynikiem wykonania następującego skryptu:<br /><br />
  <script type="text/javascript"><br />
  &nbsp;&nbsp;z = 1;<br />
  &nbsp;&nbsp;function a()<br />
  &nbsp;&nbsp;{<br />
  &nbsp;&nbsp;&nbsp;&nbsp;var z = 2;<br />
  &nbsp;&nbsp;}<br />
  &nbsp;&nbsp;a();<br />
  &nbsp;&nbsp;alert(z)<br />
  </script>                                        </th>
When i run code i get error that it can't find matching element.

Code: Select all

Error -1300: Cannot find HTML element of type "TH:" with attribute(s) "TXT:*Co będzie wynikiem wykonania następującego skryptu: <script type="text/javascript">z = Array();z.push("ABC");z.push("D");z.push("E");alert(z[1])</script>*".. Line 13: TAG POS=1 TYPE=TH ATTR=TXT:*{{VAR1}}*
Image

Text on both sites look the same.
Is there any universal solution that work with complicated text without difficult javascript?
chivracq
Posts: 7722
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting complicated text

Post by chivracq » Fri Nov 30, 2018 5:55 pm

Eilip999 wrote:

Code: Select all

Imacros browser V8.3 trial
Windows 10
Hey I am trying to make simple script to solve internet test for me automatically.
I made script that works well with simple text but simetimes text is confusing. Text on both sites look the same.
There is my script

Code: Select all

VERSION BUILD=8032216
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=http://www.odpowiedzi.c0.pl/algorytmika_programowanie/jak_wnioskuja_maszyny.html
TAB OPEN
TAG POS=1 TYPE=H4 ATTR=* EXTRACT=TXT
SET VAR1 {{!extract}}
SET !EXTRACT NULL
TAG POS=188 TYPE=P ATTR=* EXTRACT=TXT
SET VAR2 {{!extract}}
TAB T=2
URL GOTO=https://it-szkola.edu.pl/kkurs,kurs,59,test
TAG POS=1 TYPE=TH ATTR=TXT:*{{VAR1}}*
TAG POS=R1 TYPE=LABEL FORM=ID:testForm ATTR=TXT:{{VAR2}}
It copies question and correct answer then open site with test search for question then choose correct answer.

Site with answers code:

Code: Select all

        <h4>Question?</h4>
        <p>answer</p>
        <br>

Site with test:

Code: Select all

<div class="testPyt  ">
                                <table>
                                    <tr>
                                        <th colspan="2">
                                            <span class="testPytLiczFull"><span class="testPytLicz">3</span>. </span>
                                             
                                            Question?<br />                                     </th>
                                    </tr>
                                                                            <tr>
                                            <td class="testCheck"><input type="checkbox" name="idodp[830845]" id="idodp_828949_830845" value="1" /></td>
                                            <td class="testQuest"><label for="idodp_828949_830845">ANSWER1</label></td>
                                        </tr>
                                                                                <tr>
                                            <td class="testCheck"><input type="checkbox" name="idodp[830846]" id="idodp_828949_830846" value="1" /></td>
                                            <td class="testQuest"><label for="idodp_828949_830846">ANSWER2</label></td>
                                        </tr>
                                                                                <tr>
                                            <td class="testCheck"><input type="checkbox" name="idodp[830844]" id="idodp_828949_830844" value="1" /></td>
                                            <td class="testQuest"><label for="idodp_828949_830844">ANSWER3</label></td>
                                        </tr>
                                        
                                </table>

                                <br /><br />
                            </div>
When the question is one sentence it works perfectly, but sometimes it looks like this.

Site with answers:

Code: Select all

<h4>Question about script below:<br>

<script type="text/javascript"><br>
  z = 1;<br>
  function a()<br>
  {<br>
    var z = 2;<br>
  }<br>
  a();<br>
  alert(z)<br>
</script></h4>
        <p>Answer</p>
Test site code:

Code: Select all

<th colspan="2">
                                            <span class="testPytLiczFull"><span class="testPytLicz">10</span>. </span>
                                             
                                            Co będzie wynikiem wykonania następującego skryptu:<br /><br />
  <script type="text/javascript"><br />
  &nbsp;&nbsp;z = 1;<br />
  &nbsp;&nbsp;function a()<br />
  &nbsp;&nbsp;{<br />
  &nbsp;&nbsp;&nbsp;&nbsp;var z = 2;<br />
  &nbsp;&nbsp;}<br />
  &nbsp;&nbsp;a();<br />
  &nbsp;&nbsp;alert(z)<br />
  </script>                                        </th>
When i run code i get error that it can't find matching element.

Code: Select all

Error -1300: Cannot find HTML element of type "TH:" with attribute(s) "TXT:*Co będzie wynikiem wykonania następującego skryptu: <script type="text/javascript">z = Array();z.push("ABC");z.push("D");z.push("E");alert(z[1])</script>*".. Line 13: TAG POS=1 TYPE=TH ATTR=TXT:*{{VAR1}}*
Image

Text on both sites look the same.
Is there any universal solution that work with complicated text without difficult javascript?
OK, was waiting for you to post your Post to approve it, long one, have to go, will have a look when I'm back...

Hum, iMB v8.3 => 'Trial'...!, hum, sure about that...?, that Version must be 8 years old, I'm not even sure it works on Win10...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
Eilip999
Posts: 3
Joined: Fri Nov 30, 2018 4:44 pm

Re: Extracting complicated text

Post by Eilip999 » Fri Nov 30, 2018 6:01 pm

I use old version of imacros browser because new versions of addon for mozilla forefox are restricted to 50 lines of code. 8.9.7 version doen't restrict so i use older versions of imacros browser to code :D I can download latest version of trial, but i want it to be compatible with 8.9.7 firefox addon
chivracq
Posts: 7722
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting complicated text

Post by chivracq » Sat Dec 01, 2018 12:49 am

Eilip999 wrote:I use old version of imacros browser because new versions of addon for mozilla forefox are restricted to 50 lines of code. 8.9.7 version doen't restrict so i use older versions of imacros browser to code :D I can download latest version of trial, but i want it to be compatible with 8.9.7 firefox addon
OK, I'm back, I still haven't read your Post to be honest, because your Thread Title is a bit (completely) vague, this complete-whole Sub-Forum is about "Extracting", when you extract stg with iMacros, it's always about "text", and "complicated" is a completely subjective Term which doesn't mean much, see for example this recent Thread about "complicated-complicated-very-very-hard-question"...

(What is "complicated" for you will probably take me a few Seconds to solve, once I will have found the "Motivation" to read your Post..., which is nearly perfect btw, except the Thread Title that I find "vague" as I don't have any Idea from the Title about what your Pb might be about, but you've put a lot of nice Effort in it with several Blocks of Info...)

Your FCI is still a bit of "a mystery" to me, yeah the last Versions for the 3 Add-ons for IE/CR/FF are now limited to 50 Lines, but iMB is not, even Trial, then why use v8.3 and not v12.5...? If you are using a Trial Version, then use the last one, I would think...
This Sentence doesn't make sense: "8.9.7 version doen't restrict so i use older versions of imacros browser to code..."
=> Yeah well, then simply use v8.9.7 which is a Version for FF (until FF56), and has nothing to do with iMB, unless you are also using the Scripting Interface, which doesn't seem to be the case or you should have opened your Thread in the 'Scripting Interface' Sub-Forum...

But OK, maybe I'm acting a bit like a "Jerk", I post this one and I read your Post, ah-ah...! :wink:

But hum, I might reply once, but you'll need to "improve" your Thread Title to make it a bit more Descriptive anyway... 8)
Last edited by chivracq on Sat Dec 01, 2018 11:57 am, edited 2 times in total.
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
Posts: 7722
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting complicated text

Post by chivracq » Sat Dec 01, 2018 1:06 am

Hum, OK, read your OP..., but hum, it's a bit about "cheating" on some Qt & Answer Game/Competition Site, then, oops, sorry, but I don't help for that... :shock:
Nice to want to automate your "Playing" and Scoring to try to be more "clever" than your Competitors, but you need to do that on your own, or everybody will soon use the same Script, and that Game/Competition will only become a Bot vs Bot Competition which is probably not the Intention of the Site Owner... :cry:
And if it is, can you ask if they can make some English Version...?, and I'll join in the Game...! :twisted:

I actually acquired 90% of my Knowledge of iMacros from also playing that kind of Games, but tja..., I never asked anybody to help me with any of my Scripts, or the "Competition" could/would have used "my" Scripts... :shock: , and you see, a few years later, I'm now able to answer maybe 90% of all Threads posted on the Forum..., so you are on the right "Path", ah-ah...! 8)

I'm the most "active" Helper on the Forum, if you are a bit "lucky" some other Advanced User(s) might still want to help you, I would be a bit "surprised" though to be honest, you'll see... Your OP is actually "perfect" for me, I prefer "too much" Info (that I can filter easily, I'm used to work with very complex Systems), but most (Advanced) Users prefer some "simple" Qt they can understand within less than 30 sec...
You could try SOF maybe (Stackoverflow), if you don't get any further Replies here, but hum..., try to "simplify" and to summarize your "real" Qt a lot for that Forum... :idea:
But good luck anyway... :wink:
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
Posts: 7722
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting complicated text

Post by chivracq » Sat Dec 01, 2018 1:14 pm

Hum, I had a look at your Site(s) anyway, I can only access the Question Site, the other one is behind L&P...

Well, you could add a little bit of "Intelligence" to your Script by shortening a bit the Extract on the Qt, stg like:

Code: Select all

VERSION BUILD=8032216
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=http://www.odpowiedzi.c0.pl/algorytmika_programowanie/jak_wnioskuja_maszyny.html
SET KW1 " skryptu:"
SET KW2 "Keyword(s) Nb2..."
SET KW3 "... etc..."

TAG POS=1 TYPE=H4 ATTR=* EXTRACT=TXT
'SET VAR1 {{!extract}}
SET !VAR1 EVAL("var s='{{!EXTRACT}}', kw1='{{KW1}}', kw2='{{KW2}}', kw3='{{KW3}}'; var z=s.split(kw1)[0].split(kw2)[0].split(kw3)[0]; z;")
PROMPT EXTRACT:<BR>_{{!EXTRACT}}_<BR><BR>Short<SP>EXTRACT:<BR>_!VAR1}}_
'>
SET !EXTRACT NULL
'TAG POS=188 TYPE=P ATTR=* EXTRACT=TXT
TAG POS=R1 TYPE=P ATTR=* EXTRACT=TXT
SET !VAR2 {{!extract}}

TAB OPEN
TAB T=2
URL GOTO=https://it-szkola.edu.pl/kkurs,kurs,59,test
TAG POS=1 TYPE=TH ATTR=TXT:*{{!VAR1}}*
TAG POS=R1 TYPE=LABEL FORM=ID:testForm ATTR=TXT:{{!VAR2}}
(Not tested...)

I declared the Keyword(s) outside of the 'EVAL()' at the beginning of your Script for "Easy Access" and in case you want to add some extra Keywords...

And you could also add a Conditional 'PROMPT' or Sound or Pause (+ Conditional 'WAIT' to have the time to manually pause the Script, if combined with 'PROMPT' or Sound) by first checking if the 'TH' Element is (not) found by iMacros on the Answer Site...

As the 'split()' shortens your Extract from the Answer Site, you may run into the Case that there are 2 different Qt's both starting with "Co będzie wynikiem wykonania następującego skryptu:", but the Script after that will be different, => you might also add on a Check if by any chance there is not some:

Code: Select all

TAG POS=2 TYPE=TH ATTR=TXT:*{{VAR1}}*
And all the "Conditional + whatever" I have mentioned are covered by other Threads on the Forum if you search it a bit... :idea:

But hum, you should be able to find all those "Tricks" and Techniques yourself if you are studying Maths and Programming from what I understand from this Answer Site, I have the "Feeling" I'm helping you a bit to "cheat the System", but you put a lot of Effort in your OP, so... OK, fair enough..., and enjoy...! :wink:
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
Eilip999
Posts: 3
Joined: Fri Nov 30, 2018 4:44 pm

Re: Extracting complicated text

Post by Eilip999 » Tue Dec 04, 2018 12:32 pm

Thx for help. I will try your code out. My IT teatcher requires to copy and paste these answers to rise my Scholl in ranking. I just want to make scrpit that will solve it for me.
chivracq
Posts: 7722
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting complicated text

Post by chivracq » Thu Dec 06, 2018 10:37 pm

Eilip999 wrote:Thx for help. I will try your code out. My IT teatcher requires to copy and paste these answers to rise my Scholl in ranking. I just want to make scrpit that will solve it for me.
OK, and...?, any Results/Feedback/Follow-up...? :?:
(You don't need 36h to "try" my Code, done in a few minutes I would think..., already took 2 or 3 days for the first Follow-up, you need to "speed up the Process" a little bit or I'll lose interest in your Thread, I was not very-very motivated to answer it in the first place... :shock: )

>>>

Your last Post is a bit full of Typos and grammatical Mistakes btw, Communication & Documentation Language about Programming is English, your Teacher could "better" translate those 2 Sites into English and make sure his/her Students master English "a bit properly", that would raise/increase the Ranking of your School a bit automatically and probably much quicker than some C&P from one Site to another one... :idea:
Most Prog Languages don't like Typos at all, all you get is then some "Syntax Error"...! :idea:
["to rise" is Intransitive btw, "stg can rise", but "you cannot rise stg"..., but you can raise/increase/augment/improve stg..., those Verbs are Transitive... (And English is not my Native Language either...) :wink: ]
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
Post Reply