Extracting first of two lines

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.
Post Reply
zirjeo
Posts: 66
Joined: Fri Oct 28, 2016 1:49 am

Extracting first of two lines

Post by zirjeo » Wed Nov 02, 2016 3:46 am

Windows 10, IE11, VERSION BUILD=11.5.498.2403

When I go to Extract the first person's name below of John Doe it's also grabbing the second line Ann Doe, I'm trying to extract just John Doe.

It's shows like this on the website...
John Doe
Ann Doe

the htm looks like this...
'<td><span id="ctl00_ContentPlaceHolder1_lblBorrowers" style="border: 1px solid green;
'background-color: magenta;">John Doe<br>Ann Doe</span></td>

I was attempting something along this line but I don't think I'm even close

Code: Select all

TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ContentPlaceHolder1_lblBorrowers EXTRACT=TXT
SET !VAR1 EVAL("var s=\"{{!EXTRACT}}\"; s.split('<br>')[0];")
I tried '</br>' tried '<br/>' etc. But now I'm thinking <br> has nothing to do with it.

Any idea?

Thanks
WIndows 10, IE11, VERSION BUILD=11.5.498.2403
chivracq
Posts: 7821
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting first of two lines

Post by chivracq » Wed Nov 02, 2016 12:42 pm

zirjeo wrote:

Code: Select all

Windows 10, 
IE11, 
VERSION BUILD=11.5.498.2403
When I go to Extract the first person's name below of John Doe it's also grabbing the second line Ann Doe, I'm trying to extract just John Doe.

It's shows like this on the website...
John Doe
Ann Doe

the htm looks like this...
'<td><span id="ctl00_ContentPlaceHolder1_lblBorrowers" style="border: 1px solid green;
'background-color: magenta;">John Doe<br>Ann Doe</span></td>

I was attempting something along this line but I don't think I'm even close

Code: Select all

TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ContentPlaceHolder1_lblBorrowers EXTRACT=TXT
SET !VAR1 EVAL("var s=\"{{!EXTRACT}}\"; s.split('<br>')[0];")
I tried '</br>' tried '<br/>' etc. But now I'm thinking <br> has nothing to do with it.

Any idea?

Thanks
Try this:

Code: Select all

TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ContentPlaceHolder1_lblBorrowers EXTRACT=HTM
SET !VAR1 EVAL("var s='{{!EXTRACT}}'; var x,y,z; y=s.split('>'); z=y[1].split('<'); z[0];")
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
zirjeo
Posts: 66
Joined: Fri Oct 28, 2016 1:49 am

Re: Extracting first of two lines

Post by zirjeo » Wed Nov 02, 2016 2:04 pm

That actually worked. Thank you! I didn't realize it would ignore the > in <td> at the start of the HTM, I guess it's only looking at everything inside <td></td>

To get around that I had previously tried..

Code: Select all

SET !VAR1 EVAL("var s='{{!EXTRACT}}'; var x,y,z; y=s.split('a;">'); z=y[1].split('<br>'); z[0];")
but the " symbol inside the first split was throwing it off.

Just curious, say there was another > symbol prior to the one we needed, is there anyway to make it use the second > symbol in?
WIndows 10, IE11, VERSION BUILD=11.5.498.2403
chivracq
Posts: 7821
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting first of two lines

Post by chivracq » Wed Nov 02, 2016 2:34 pm

zirjeo wrote:That actually worked. Thank you! I didn't realize it would ignore the > in <td> at the start of the HTM, I guess it's only looking at everything inside <td></td>

To get around that I had previously tried..

Code: Select all

SET !VAR1 EVAL("var s='{{!EXTRACT}}'; var x,y,z; y=s.split('a;">'); z=y[1].split('<br>'); z[0];")
but the " symbol inside the first split was throwing it off.

Just curious, say there was another > symbol prior to the one we needed, is there anyway to make it use the second > symbol in?
Youdipoo...!, good to hear that it works, ah-ah...! Well, I had a good 'feeling", I didn't even test it...

Well, your TAG Statement was on the 'SPAN' which is inside the 'TD', so it only extracts at the 'SPAN' Level, if you had extracted at the 'TD' Level, you would get the 'TD' + 'SPAN' HTML Data.

Yep, I deliberately avoided the _"_ (Double Quote) in my 'split()', foreseeing that this Char could be problematic and would require some Escaping...

The "0" / "1" Values in "y[1]" / "z[0]" indicate which n_th part of the Array returned by 'split()' you want to keep, 0 is the first part, 1 is the second part, etc...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
zirjeo
Posts: 66
Joined: Fri Oct 28, 2016 1:49 am

Re: Extracting first of two lines

Post by zirjeo » Wed Nov 02, 2016 2:59 pm

Oh that's what SPAN does, okay thank you! Is there any documentation anywhere where they explain what all the symbols do such as / ? ' \ in regards to split, replace, join etc.? I had searched for a while I couldn't find anything.
WIndows 10, IE11, VERSION BUILD=11.5.498.2403
chivracq
Posts: 7821
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting first of two lines

Post by chivracq » Wed Nov 02, 2016 3:16 pm

zirjeo wrote:Oh that's what SPAN does, okay thank you! Is there any documentation anywhere where they explain what all the symbols do such as / ? ' \ in regards to split, replace, join etc.? I had searched for a while I couldn't find anything.
Euh, no it's not "what SPAN does", 'SPAN' is an HTML Type of Elements, like 'TD', 'DIV', 'TABLE', 'BODY', etc...
More Info on the following Wiki Pages:
- TAG parameters explained (+ External Links)
- TAGs and HTML

Yep, for all the Syntax with those /?'\ etc Characters that I don't really master myself either, and that's why I always use 'split()', you find the best Examples and Documentation when searching on "Global 'replace()'" + 'match()' and 'REGEXP'...
And this is a useful Resource for 'REGEX': http://regexr.com/
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
zirjeo
Posts: 66
Joined: Fri Oct 28, 2016 1:49 am

Re: Extracting first of two lines

Post by zirjeo » Wed Nov 02, 2016 4:07 pm

Ok. Thank you!
WIndows 10, IE11, VERSION BUILD=11.5.498.2403
Post Reply