Issue Extract with firefox plugin

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Issue Extract with firefox plugin

by Felipe1 on Mon May 15, 2017 1:27 am

Browser: Firefox 53.0.2 (32 bit)
S.O. Windows 8.1 Pro
plug in: iMacros for Firefox 9.0.3 (updated April, 24. 2017)

-----------------------

Hello,

First of all , sorry for my bad english.

I'm having an issue extracting TXT when in the middle of the text to extract there are Bold or Italic characters. It happens with Firefox plugin.

iMacros program works properly

I've changed the demo page (http://demo.imacros.net/Automate/Extract2) HTML to test if the problem was caused for the web page where I want to extract The text.

I've added bold characters in the middle of the phrase like this:
"The second line is extracted too" for "The second line is extracted too".

In HTML:

<td style="width: 52%; outline: 1px solid blue;" class="bdytxt">
This line is extracted.<br>
The <strong>second line</strong>
is extracted, too.
</td>

Running this macro:

VERSION BUILD=8031994
TAB T=1
SET !EXTRACT_TEST_POPUP NO
URL GOTO=http://demo.imacros.net/Automate/Extract2
TAG POS=1 TYPE=TD ATTR=CLASS:bdytxt&&TXT:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=+_{{!NOW:yyyymmdd_hhnnss}}

the result is:

This line is extracted.
TheThis line is extracted.
Thesecond lineis extracted, too.

instead of:

This line is extracted.
The second line is extracted, too.


It seems as the extracted text repeats himself when finds the bold tag (<strong>)

Thank You in advance
Felipe1
 
Posts: 2
Joined: Mon May 15, 2017 1:10 am

Re: Issue Extract with firefox plugin

by chivracq on Mon May 15, 2017 5:49 am

Felipe1 wrote:
Code: Select all
Browser: Firefox 53.0.2 (32 bit)
S.O. Windows 8.1 Pro
plug in: iMacros for Firefox 9.0.3 (updated April, 24. 2017)


-----------------------

Hello,

First of all , sorry for my bad english.

I'm having an issue extracting TXT when in the middle of the text to extract there are Bold or Italic characters. It happens with Firefox plugin.

iMacros program works properly

I've changed the demo page (http://demo.imacros.net/Automate/Extract2) HTML to test if the problem was caused for the web page where I want to extract The text.

I've added bold characters in the middle of the phrase like this:
"The second line is extracted too" for "The second line is extracted too".

In HTML:
Code: Select all
<td style="width: 52%; outline: 1px solid blue;" class="bdytxt">
 This line is extracted.<br>
 The <strong>second line</strong>
 is extracted, too.
</td>


Running this macro:
Code: Select all
VERSION BUILD=8031994
TAB T=1
SET !EXTRACT_TEST_POPUP NO
URL GOTO=http://demo.imacros.net/Automate/Extract2
TAG POS=1 TYPE=TD ATTR=CLASS:bdytxt&&TXT:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=+_{{!NOW:yyyymmdd_hhnnss}}


the result is:
Code: Select all
This line is extracted.
TheThis line is extracted.
Thesecond lineis extracted, too.


instead of:
Code: Select all
This line is extracted.
The second line is extracted, too.


It seems as the extracted text repeats himself when finds the bold tag (<strong>)

Thank You in advance

Hum..., sounds like a Bug to me indeed... But..., hum again..., I cannot reproduce, it works fine for me after modifying the Demo Page like you did... I attach my "own" Demo Page to the Thread...

Your Extract Statement:
Code: Select all
TAG POS=1 TYPE=TD ATTR=CLASS:bdytxt&&TXT:* EXTRACT=TXT
returns for me the expected Result:
Code: Select all
                This line is extracted.
                The second line is extracted, too.
           


And even the following Extract on the 'STRONG' Element:
Code: Select all
'TAG POS=1 TYPE=STRONG ATTR=TXT:second<SP>line EXTRACT=HTM
TAG POS=1 TYPE=STRONG ATTR=TXT:* EXTRACT=TXT
returns the expected Result:
Code: Select all
second line

I tested in 2 different FCI's, both with the same Results:
- FCI_1: iMacros for FF v8.8.2, Pale Moon v26.3.3 (=FF47), Win10-x64.
- FCI_2: iMacros for FF v8.9.7, FF51, Win10-x64.

It could be specific to only v9.0.3 that you are using as I tested on 2 earlier Versions of iMacros for FF...
But v9.0.3 is pretty Buggy actually and never really got tested as most "serious" Users quickly reverted to v8.9.7 when v9.0.3 was released, so new Bugs keep slowly reaching the Forum...
But v8.9.7 is much more stable and reliable than v9.0.3, so Advice would be for you to revert to v8.9.7...! (It still works on FF52/53.)
(Make sure to disable Automatic Updates for iMacros otherwise it will update itself back to v9.0.3, ah-ah...!)
Attachments
iMacros - STRONG.zip
(82.47 KIB) Downloaded 72 times
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6490
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Issue Extract with firefox plugin

by Felipe1 on Tue May 16, 2017 12:12 am

Thank you very much !!!

Yes, its a bug from version 9.0.3. Reverting to version 8.9.7 it works fine

;-)
Felipe1
 
Posts: 2
Joined: Mon May 15, 2017 1:10 am


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 5 guests

-->