Extracting Text from Onmouseover

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Extracting Text from Onmouseover

by newmember on Sun May 08, 2016 8:39 pm

Hello,
I want to extract the text by a mouseover event in this page: https://chinesepod.com/tools/glossary/entry/fish

For example: with the second sentence:
Code: Select all
                    <td width="80%" style="font-size:18px;">
                        <span onclick="onWordClick()" onmouseover="tip(event,'he','ta1','他','他')" onmouseout="htip()">他</span><span onclick="onWordClick()" onmouseover="tip(event,'not','bu4','不','不')" onmouseout="htip()">不</span><span onclick="onWordClick()" onmouseover="tip(event,'to like','xi3huan5','喜欢','喜歡')" onmouseout="htip()">喜欢</span><span onclick="onWordClick()" onmouseover="tip(event,'to eat','chi1','吃','吃')" onmouseout="htip()">吃</span><span onclick="onWordClick()" onmouseover="tip(event,'fish','yu2','鱼','魚')" onmouseout="htip()">鱼</span>。                        <br />
                        (He doesn't like to eat fish.)
                    </td>


And extracted data will be:
Code: Select all
- he not to like to eat fish
- ta1 bu4 xi3huan5 chi1 yu2


Can you tell me how to do it ? I appreciate so much. Thank you very much ! :D


Below is my imacros information:
1. What version of iMacros are you using?
VERSION BUILD=8970419 RECORDER=FX

2. What operating system are you using? (please also specify language)
Windows 7 Ultimate x64

3. Which browser(s) are you using? (include version numbers)
Firefox 46.0.1

4. Do the included demo macros work ok?
Yes
newmember
 
Posts: 20
Joined: Mon Mar 17, 2014 12:01 am

Re: Extracting Text from Onmouseover

by chivracq on Mon May 09, 2016 2:27 am

newmember wrote:Hello,
I want to extract the text by a mouseover event in this page: https://chinesepod.com/tools/glossary/entry/fish

For example: with the second sentence:
Code: Select all
                    <td width="80%" style="font-size:18px;">
                        <span onclick="onWordClick()" onmouseover="tip(event,'he','ta1','他','他')" onmouseout="htip()">他</span><span onclick="onWordClick()" onmouseover="tip(event,'not','bu4','不','不')" onmouseout="htip()">不</span><span onclick="onWordClick()" onmouseover="tip(event,'to like','xi3huan5','喜欢','喜歡')" onmouseout="htip()">喜欢</span><span onclick="onWordClick()" onmouseover="tip(event,'to eat','chi1','吃','吃')" onmouseout="htip()">吃</span><span onclick="onWordClick()" onmouseover="tip(event,'fish','yu2','鱼','魚')" onmouseout="htip()">鱼</span>。                        <br />
                        (He doesn't like to eat fish.)
                    </td>


And extracted data will be:
Code: Select all
- he not to like to eat fish
- ta1 bu4 xi3huan5 chi1 yu2


Can you tell me how to do it ? I appreciate so much. Thank you very much ! :D

Below is my imacros information:
Code: Select all
1. What version of iMacros are you using?
VERSION BUILD=8970419 RECORDER=FX

2. What operating system are you using? (please also specify language)
Windows 7 Ultimate x64

3. Which browser(s) are you using? (include version numbers)
Firefox 46.0.1

4. Do the included demo macros work ok?
Yes

Hum..., interesting Case and Site, I had a quick look at your Site and achieving what you want will be a bit tricky indeed though not "gigantically" complicated if you understand the Method I use and I've already produced several Examples for the Forum...

In a Nutshell, all the Data you are after is contained in the 'EXTRACT=HTM' on the 'TD' Element, each of them in their apart 'SPAN' Element and you can isolate the exact Data that you want to keep using 'EVAL()' + 'split()' (x2 per (Group of) Letter(s)).
Mini-complicating bit is that the Phrases are not all of the same Length, between 1 to about 20 approx on the Page you've provided, you would need to loop your Macro if you go for 1 Group of Letters per Loop, but saving the Data on the final Loop will require the use of a Temp File or the Clipboard if you want to save it all together in one same Row in your '.CSV' 'SAVEAS'..., or you'd already Nested Loops for 1 Sentence, and I guess you will want your Macro to extract all the Data for the whole Page in one Macro, which will require a Nested Loop in Nested Loops (3 Levels).

Instead of using (Nested) Loops, you could hardcode about 20 Blocks which seems to correspond with the Max Number of Letter Groups for each Sentence, and still using 'EVAL()' spit out the Data you want to isolate or an Empty String once you've reached the End of the Sentence, the 'length' of the 'split()' Array on "span" (-1) will give you the Number of Letter Groups in each Sentence.

But, pfff, there are several Solutions, it's possible to extract the same 'EXTRACT=HTM' Data on each 'SPAN' Element within a 'TD' Element (= one Sentence) and using Relative Positioning, to check if you've reached the End of the Sentence.

But hum, not sure you will understand everything I am trying to explain, I'm afraid it might still be a bit Complex..., oops...!
Well, good luck anyway...!, but you'll need to be very Systematic in your Macro...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6474
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extracting Text from Onmouseover

by newmember on Wed May 11, 2016 8:26 pm

Yes, I don't understand everything you said :mrgreen: Thank you very detailed explaining. I will try myself :)
newmember
 
Posts: 20
Joined: Mon Mar 17, 2014 12:01 am

Re: Extracting Text from Onmouseover

by chivracq on Thu May 12, 2016 10:17 am

newmember wrote:Yes, I don't understand everything you said :mrgreen: Thank you very detailed explaining. I will try myself :)

Yep, I realize, but it comes from the Site which has a complex HTML Structure to reach this Horizontal/Vertical Presentation in a Mouseover in Variable Lengths for the Number of Letters within a same "Group of Letters" and for the Number of Groups within a same Sentence...

Maybe you or some other (Advanced) User will come up with a more simple Method than mine, and you start digging into the Code and post it once you've already started and really get stuck, then I'm willing to help you further...

If you go using (some parts of) my Method, here are already a few Threads that might help you understand the few Techniques I mentioned, with working Examples I previously posted:
- Re: Number of Options in a Select tag (About using the Length of the Array on 'split()' to determine the Number of 'Groups of Letters' in a Sentence...)
- Re: How to extract a variable number (On how I use 'EVAL()' + a Double 'split()' to isolate some (Dynamic) Data from some 'EXTRACT'. (Is 'TXT' in that Example, but it's the same Principle for 'HTM'...)
The only (extra) Difficulty in this last Thread will be handling the Single Quotes using a mix of (Escaped) Double Quotes and Single Quotes either directly in the 'split()' or with an extra 'replace()' as I always (try to) use Single Quotes in my Examples if possible, instead of the "official" way with Escaped Double Quotes...

>>>

Hum..., interesting Case I said, ah-ah...!, I'm actually thinking of another Method that I previously used in another Thread that might be a bit more simple (even if it was qualified of "ingenious" by the User in that Thread, :oops: ), where I used a Double Extraction on 2 or the same URL(s) to compare them for some further Conditional Logic... 8)
In your Case it would be a Double Extraction on each 'SPAN' Element (= Group of Letters) on 'HTM' to get the Info from the Mouseover and on 'TXT' to get the Chinese Letters to reuse with 'EVAL()' in the 'HTM' Extract to relatively locate the Data you are interested in, with 'split()', 'str()', 'indexOf()', etc... Fairly simple to implement, I would think... And you could simply loop your Macro on one Group of Letters (+ Temp File or Clipboard) or one Sentence, for the whole Page, with a Conditional 'SAVEAS' each time you've reached the End of a Phrase, like that you can avoid the fairly complex Concept of the 3 Levels of Nested Loops... :idea:
- Help on TAG or SEARCH

OK, good luck, and I'll be curious to see what you come up with, ah-ah...! :twisted:
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6474
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extracting Text from Onmouseover

by NoraChoi on Wed May 25, 2016 11:45 pm

It would be better to have a tool to extract the content for you. :D

You can extract datahttp://www.octoparse.com/?=bl(or information) from Onmouseover in HTML source. Octoparse can extract any HTML text for you. Easy to use and free. :D
NoraChoi
 
Posts: 6
Joined: Wed May 25, 2016 11:14 pm

Re: Extracting Text from Onmouseover

by chivracq on Thu May 26, 2016 8:40 am

NoraChoi wrote:It would be better to have a tool to extract the content for you. :D

You can extract datahttp://www.octoparse.com/?=bl(or information) from Onmouseover in HTML source. Octoparse can extract any HTML text for you. Easy to use and free. :D

Hum, looks like an interesting Tool indeed, I had never heard about it...
I will give it a try even if I'm always a bit "suspicious" about the Quality of Software when I see all the grammar + spelling mistakes on the FAQ...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6474
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extracting Text from Onmouseover

by NoraChoi on Thu May 26, 2016 7:43 pm

chivracq wrote:
NoraChoi wrote:It would be better to have a tool to extract the content for you. :D

You can extract datahttp://www.octoparse.com/?=bl(or information) from Onmouseover in HTML source. Octoparse can extract any HTML text for you. Easy to use and free. :D

Hum, looks like an interesting Tool indeed, I had never heard about it...
I will give it a try even if I'm always a bit "suspicious" about the Quality of Software when I see all the grammar + spelling mistakes on the FAQ...



Hi chivracq.
:shock: I will edit the FAQ... :wink:
Just try to use Octoparse (download it http://www.octoparse.com/download/)and extract what you want.
Tutorials for you herehttp://www.octoparse.com/Tutorial.
Try it now. :D I am always happy to help~
NoraChoi
 
Posts: 6
Joined: Wed May 25, 2016 11:14 pm

Re: Extracting Text from Onmouseover

by janib4all on Tue Jun 07, 2016 1:49 pm

Try this:

MyVariable = "...the.long.html.text...";

Set FinalText Eval("var a = '{{!extract}}'.replace(/<[^>]+>/g, '').replace(/^\s+|\s{2,}|\s+$/g, ''); a;")
Prompt {{FinalText}}

This will convert whole html into:
他不喜欢吃鱼。(He doesn't like to eat fish.)

If you need the content inside the ( ) then add this line afterward:

Set InsideBraces Eval("var a = '{{FinalText}}'.match(/\((,+)\)/); a[1];")
Prompt {{InsideBraces}}
Hire the BoT-fReeak!
botspecialist.blogspot.com
janib4all
 
Posts: 132
Joined: Tue Jul 20, 2010 11:44 pm
Location: Karachi, Sindh, Pakistan


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 10 guests

cron
-->