Extracting Text from Onmouseover

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
newmember
Posts: 20
Joined: Mon Mar 17, 2014 7:01 am

Extracting Text from Onmouseover

Post by newmember » Mon May 09, 2016 3:39 am

Hello,
I want to extract the text by a mouseover event in this page: https://chinesepod.com/tools/glossary/entry/fish

For example: with the second sentence:

Code: Select all

                    <td width="80%" style="font-size:18px;">
                        <span onclick="onWordClick()" onmouseover="tip(event,'he','ta1','他','他')" onmouseout="htip()">他</span><span onclick="onWordClick()" onmouseover="tip(event,'not','bu4','不','不')" onmouseout="htip()">不</span><span onclick="onWordClick()" onmouseover="tip(event,'to like','xi3huan5','喜欢','喜歡')" onmouseout="htip()">喜欢</span><span onclick="onWordClick()" onmouseover="tip(event,'to eat','chi1','吃','吃')" onmouseout="htip()">吃</span><span onclick="onWordClick()" onmouseover="tip(event,'fish','yu2','鱼','魚')" onmouseout="htip()">鱼</span>。                        <br />
                        (He doesn't like to eat fish.)
                    </td>
And extracted data will be:

Code: Select all

- he not to like to eat fish
- ta1 bu4 xi3huan5 chi1 yu2
Can you tell me how to do it ? I appreciate so much. Thank you very much ! :D


Below is my imacros information:
1. What version of iMacros are you using?
VERSION BUILD=8970419 RECORDER=FX

2. What operating system are you using? (please also specify language)
Windows 7 Ultimate x64

3. Which browser(s) are you using? (include version numbers)
Firefox 46.0.1

4. Do the included demo macros work ok?
Yes
chivracq
Posts: 8698
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting Text from Onmouseover

Post by chivracq » Mon May 09, 2016 9:27 am

newmember wrote:Hello,
I want to extract the text by a mouseover event in this page: https://chinesepod.com/tools/glossary/entry/fish

For example: with the second sentence:

Code: Select all

                    <td width="80%" style="font-size:18px;">
                        <span onclick="onWordClick()" onmouseover="tip(event,'he','ta1','他','他')" onmouseout="htip()">他</span><span onclick="onWordClick()" onmouseover="tip(event,'not','bu4','不','不')" onmouseout="htip()">不</span><span onclick="onWordClick()" onmouseover="tip(event,'to like','xi3huan5','喜欢','喜歡')" onmouseout="htip()">喜欢</span><span onclick="onWordClick()" onmouseover="tip(event,'to eat','chi1','吃','吃')" onmouseout="htip()">吃</span><span onclick="onWordClick()" onmouseover="tip(event,'fish','yu2','鱼','魚')" onmouseout="htip()">鱼</span>。                        <br />
                        (He doesn't like to eat fish.)
                    </td>
And extracted data will be:

Code: Select all

- he not to like to eat fish
- ta1 bu4 xi3huan5 chi1 yu2
Can you tell me how to do it ? I appreciate so much. Thank you very much ! :D

Below is my imacros information:

Code: Select all

1. What version of iMacros are you using?
VERSION BUILD=8970419 RECORDER=FX

2. What operating system are you using? (please also specify language)
Windows 7 Ultimate x64

3. Which browser(s) are you using? (include version numbers)
Firefox 46.0.1

4. Do the included demo macros work ok?
Yes
Hum..., interesting Case and Site, I had a quick look at your Site and achieving what you want will be a bit tricky indeed though not "gigantically" complicated if you understand the Method I use and I've already produced several Examples for the Forum...

In a Nutshell, all the Data you are after is contained in the 'EXTRACT=HTM' on the 'TD' Element, each of them in their apart 'SPAN' Element and you can isolate the exact Data that you want to keep using 'EVAL()' + 'split()' (x2 per (Group of) Letter(s)).
Mini-complicating bit is that the Phrases are not all of the same Length, between 1 to about 20 approx on the Page you've provided, you would need to loop your Macro if you go for 1 Group of Letters per Loop, but saving the Data on the final Loop will require the use of a Temp File or the Clipboard if you want to save it all together in one same Row in your '.CSV' 'SAVEAS'..., or you'd already Nested Loops for 1 Sentence, and I guess you will want your Macro to extract all the Data for the whole Page in one Macro, which will require a Nested Loop in Nested Loops (3 Levels).

Instead of using (Nested) Loops, you could hardcode about 20 Blocks which seems to correspond with the Max Number of Letter Groups for each Sentence, and still using 'EVAL()' spit out the Data you want to isolate or an Empty String once you've reached the End of the Sentence, the 'length' of the 'split()' Array on "span" (-1) will give you the Number of Letter Groups in each Sentence.

But, pfff, there are several Solutions, it's possible to extract the same 'EXTRACT=HTM' Data on each 'SPAN' Element within a 'TD' Element (= one Sentence) and using Relative Positioning, to check if you've reached the End of the Sentence.

But hum, not sure you will understand everything I am trying to explain, I'm afraid it might still be a bit Complex..., oops...!
Well, good luck anyway...!, but you'll need to be very Systematic in your Macro...
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
newmember
Posts: 20
Joined: Mon Mar 17, 2014 7:01 am

Re: Extracting Text from Onmouseover

Post by newmember » Thu May 12, 2016 3:26 am

Yes, I don't understand everything you said :mrgreen: Thank you very detailed explaining. I will try myself :)
chivracq
Posts: 8698
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting Text from Onmouseover

Post by chivracq » Thu May 12, 2016 5:17 pm

newmember wrote:Yes, I don't understand everything you said :mrgreen: Thank you very detailed explaining. I will try myself :)
Yep, I realize, but it comes from the Site which has a complex HTML Structure to reach this Horizontal/Vertical Presentation in a Mouseover in Variable Lengths for the Number of Letters within a same "Group of Letters" and for the Number of Groups within a same Sentence...

Maybe you or some other (Advanced) User will come up with a more simple Method than mine, and you start digging into the Code and post it once you've already started and really get stuck, then I'm willing to help you further...

If you go using (some parts of) my Method, here are already a few Threads that might help you understand the few Techniques I mentioned, with working Examples I previously posted:
- Re: Number of Options in a Select tag (About using the Length of the Array on 'split()' to determine the Number of 'Groups of Letters' in a Sentence...)
- Re: How to extract a variable number (On how I use 'EVAL()' + a Double 'split()' to isolate some (Dynamic) Data from some 'EXTRACT'. (Is 'TXT' in that Example, but it's the same Principle for 'HTM'...)
The only (extra) Difficulty in this last Thread will be handling the Single Quotes using a mix of (Escaped) Double Quotes and Single Quotes either directly in the 'split()' or with an extra 'replace()' as I always (try to) use Single Quotes in my Examples if possible, instead of the "official" way with Escaped Double Quotes...

>>>

Hum..., interesting Case I said, ah-ah...!, I'm actually thinking of another Method that I previously used in another Thread that might be a bit more simple (even if it was qualified of "ingenious" by the User in that Thread, :oops: ), where I used a Double Extraction on 2 or the same URL(s) to compare them for some further Conditional Logic... 8)
In your Case it would be a Double Extraction on each 'SPAN' Element (= Group of Letters) on 'HTM' to get the Info from the Mouseover and on 'TXT' to get the Chinese Letters to reuse with 'EVAL()' in the 'HTM' Extract to relatively locate the Data you are interested in, with 'split()', 'str()', 'indexOf()', etc... Fairly simple to implement, I would think... And you could simply loop your Macro on one Group of Letters (+ Temp File or Clipboard) or one Sentence, for the whole Page, with a Conditional 'SAVEAS' each time you've reached the End of a Phrase, like that you can avoid the fairly complex Concept of the 3 Levels of Nested Loops... :idea:
- Help on TAG or SEARCH

OK, good luck, and I'll be curious to see what you come up with, ah-ah...! :twisted:
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
NoraChoi
Posts: 6
Joined: Thu May 26, 2016 6:14 am

Re: Extracting Text from Onmouseover

Post by NoraChoi » Thu May 26, 2016 6:45 am

It would be better to have a tool to extract the content for you. :D

You can extract datahttp://www.octoparse.com/?=bl(or information) from Onmouseover in HTML source. Octoparse can extract any HTML text for you. Easy to use and free. :D
chivracq
Posts: 8698
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting Text from Onmouseover

Post by chivracq » Thu May 26, 2016 3:40 pm

NoraChoi wrote:It would be better to have a tool to extract the content for you. :D

You can extract datahttp://www.octoparse.com/?=bl(or information) from Onmouseover in HTML source. Octoparse can extract any HTML text for you. Easy to use and free. :D
Hum, looks like an interesting Tool indeed, I had never heard about it...
I will give it a try even if I'm always a bit "suspicious" about the Quality of Software when I see all the grammar + spelling mistakes on the FAQ...
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
NoraChoi
Posts: 6
Joined: Thu May 26, 2016 6:14 am

Re: Extracting Text from Onmouseover

Post by NoraChoi » Fri May 27, 2016 2:43 am

chivracq wrote:
NoraChoi wrote:It would be better to have a tool to extract the content for you. :D

You can extract datahttp://www.octoparse.com/?=bl(or information) from Onmouseover in HTML source. Octoparse can extract any HTML text for you. Easy to use and free. :D
Hum, looks like an interesting Tool indeed, I had never heard about it...
I will give it a try even if I'm always a bit "suspicious" about the Quality of Software when I see all the grammar + spelling mistakes on the FAQ...

Hi chivracq.
:shock: I will edit the FAQ... :wink:
Just try to use Octoparse (download it http://www.octoparse.com/download/)and extract what you want.
Tutorials for you herehttp://www.octoparse.com/Tutorial.
Try it now. :D I am always happy to help~
janib4all
Posts: 132
Joined: Wed Jul 21, 2010 6:44 am
Location: Karachi, Sindh, Pakistan
Contact:

Re: Extracting Text from Onmouseover

Post by janib4all » Tue Jun 07, 2016 8:49 pm

Try this:

MyVariable = "...the.long.html.text...";

Set FinalText Eval("var a = '{{!extract}}'.replace(/<[^>]+>/g, '').replace(/^\s+|\s{2,}|\s+$/g, ''); a;")
Prompt {{FinalText}}

This will convert whole html into:
他不喜欢吃鱼。(He doesn't like to eat fish.)

If you need the content inside the ( ) then add this line afterward:

Set InsideBraces Eval("var a = '{{FinalText}}'.match(/\((,+)\)/); a[1];")
Prompt {{InsideBraces}}
Hire the BoT-fReeak!
botspecialist.blogspot.com
Post Reply