Extracting first of two lines

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Extracting first of two lines

by zirjeo on Tue Nov 01, 2016 8:46 pm

Windows 10, IE11, VERSION BUILD=11.5.498.2403

When I go to Extract the first person's name below of John Doe it's also grabbing the second line Ann Doe, I'm trying to extract just John Doe.

It's shows like this on the website...
John Doe
Ann Doe

the htm looks like this...
'<td><span id="ctl00_ContentPlaceHolder1_lblBorrowers" style="border: 1px solid green;
'background-color: magenta;">John Doe<br>Ann Doe</span></td>

I was attempting something along this line but I don't think I'm even close
Code: Select all
TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ContentPlaceHolder1_lblBorrowers EXTRACT=TXT
SET !VAR1 EVAL("var s=\"{{!EXTRACT}}\"; s.split('<br>')[0];")


I tried '</br>' tried '<br/>' etc. But now I'm thinking <br> has nothing to do with it.

Any idea?

Thanks
WIndows 10, IE11, VERSION BUILD=11.5.498.2403
zirjeo
 
Posts: 59
Joined: Thu Oct 27, 2016 6:49 pm

Re: Extracting first of two lines

by chivracq on Wed Nov 02, 2016 5:42 am

zirjeo wrote:
Code: Select all
Windows 10,
IE11,
VERSION BUILD=11.5.498.2403


When I go to Extract the first person's name below of John Doe it's also grabbing the second line Ann Doe, I'm trying to extract just John Doe.

It's shows like this on the website...
John Doe
Ann Doe

the htm looks like this...
'<td><span id="ctl00_ContentPlaceHolder1_lblBorrowers" style="border: 1px solid green;
'background-color: magenta;">John Doe<br>Ann Doe</span></td>

I was attempting something along this line but I don't think I'm even close
Code: Select all
TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ContentPlaceHolder1_lblBorrowers EXTRACT=TXT
SET !VAR1 EVAL("var s=\"{{!EXTRACT}}\"; s.split('<br>')[0];")


I tried '</br>' tried '<br/>' etc. But now I'm thinking <br> has nothing to do with it.

Any idea?

Thanks

Try this:
Code: Select all
TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ContentPlaceHolder1_lblBorrowers EXTRACT=HTM
SET !VAR1 EVAL("var s='{{!EXTRACT}}'; var x,y,z; y=s.split('>'); z=y[1].split('<'); z[0];")
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6131
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extracting first of two lines

by zirjeo on Wed Nov 02, 2016 7:04 am

That actually worked. Thank you! I didn't realize it would ignore the > in <td> at the start of the HTM, I guess it's only looking at everything inside <td></td>

To get around that I had previously tried..
Code: Select all
SET !VAR1 EVAL("var s='{{!EXTRACT}}'; var x,y,z; y=s.split('a;">'); z=y[1].split('<br>'); z[0];")

but the " symbol inside the first split was throwing it off.

Just curious, say there was another > symbol prior to the one we needed, is there anyway to make it use the second > symbol in?
WIndows 10, IE11, VERSION BUILD=11.5.498.2403
zirjeo
 
Posts: 59
Joined: Thu Oct 27, 2016 6:49 pm

Re: Extracting first of two lines

by chivracq on Wed Nov 02, 2016 7:34 am

zirjeo wrote:That actually worked. Thank you! I didn't realize it would ignore the > in <td> at the start of the HTM, I guess it's only looking at everything inside <td></td>

To get around that I had previously tried..
Code: Select all
SET !VAR1 EVAL("var s='{{!EXTRACT}}'; var x,y,z; y=s.split('a;">'); z=y[1].split('<br>'); z[0];")

but the " symbol inside the first split was throwing it off.

Just curious, say there was another > symbol prior to the one we needed, is there anyway to make it use the second > symbol in?

Youdipoo...!, good to hear that it works, ah-ah...! Well, I had a good 'feeling", I didn't even test it...

Well, your TAG Statement was on the 'SPAN' which is inside the 'TD', so it only extracts at the 'SPAN' Level, if you had extracted at the 'TD' Level, you would get the 'TD' + 'SPAN' HTML Data.

Yep, I deliberately avoided the _"_ (Double Quote) in my 'split()', foreseeing that this Char could be problematic and would require some Escaping...

The "0" / "1" Values in "y[1]" / "z[0]" indicate which n_th part of the Array returned by 'split()' you want to keep, 0 is the first part, 1 is the second part, etc...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6131
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extracting first of two lines

by zirjeo on Wed Nov 02, 2016 7:59 am

Oh that's what SPAN does, okay thank you! Is there any documentation anywhere where they explain what all the symbols do such as / ? ' \ in regards to split, replace, join etc.? I had searched for a while I couldn't find anything.
WIndows 10, IE11, VERSION BUILD=11.5.498.2403
zirjeo
 
Posts: 59
Joined: Thu Oct 27, 2016 6:49 pm

Re: Extracting first of two lines

by chivracq on Wed Nov 02, 2016 8:16 am

zirjeo wrote:Oh that's what SPAN does, okay thank you! Is there any documentation anywhere where they explain what all the symbols do such as / ? ' \ in regards to split, replace, join etc.? I had searched for a while I couldn't find anything.

Euh, no it's not "what SPAN does", 'SPAN' is an HTML Type of Elements, like 'TD', 'DIV', 'TABLE', 'BODY', etc...
More Info on the following Wiki Pages:
- TAG parameters explained (+ External Links)
- TAGs and HTML

Yep, for all the Syntax with those /?'\ etc Characters that I don't really master myself either, and that's why I always use 'split()', you find the best Examples and Documentation when searching on "Global 'replace()'" + 'match()' and 'REGEXP'...
And this is a useful Resource for 'REGEX': http://regexr.com/
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6131
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extracting first of two lines

by zirjeo on Wed Nov 02, 2016 9:07 am

Ok. Thank you!
WIndows 10, IE11, VERSION BUILD=11.5.498.2403
zirjeo
 
Posts: 59
Joined: Thu Oct 27, 2016 6:49 pm


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: Google [Bot] and 3 guests

-->