Extracting text from one textbox into separated lines.

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Extracting text from one textbox into separated lines.

by NaiLeD on Mon Feb 22, 2016 2:41 pm

VERSION BUILD=8.9.6
Windows 7
Mozilla Firefox 44.0.2
So I have a textbox, which contain this text (example):
facebook.com/12345678910111
facebook.com/23456789101112
facebook.com/34567891011121

After using extraction I always got this (text without line breaks):
"http://facebook.com/12345678910111http://facebook.com/23456789101112facebook.com/34567891011121"

The problem is, that I need iMacros to save text from textbox with line breaks.
I need this 'coz this textbox contains list of FaceBook IDs. After saving them in txt file, I use it as sourcefile for !DATASOURCE, so I can input into searchbox on facebook 1 ID at each loop of iMacros.

Code: Select all
TAG POS=1 TYPE=TEXTAREA ATTR=ID:text EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=FBID.txt

Code: Select all
SET !DATASOURCE FBID.txt
SET !DATASOURCE_LINE {{!LOOP}}
TAG POS=1 TYPE=INPUT:TEXT ATTR=ID:gedit_users_search_inp CONTENT={{!COL1}}


If anyone could help me, I would appreciate.
Last edited by NaiLeD on Tue Feb 23, 2016 9:13 pm, edited 3 times in total.
NaiLeD
 
Posts: 2
Joined: Mon Feb 22, 2016 1:53 pm

Re: Extractioning text from one textbox into separated lines

by chivracq on Tue Feb 23, 2016 8:08 am

NaiLeD wrote:VERSION BUILD=8.9.6.1227
Windows 7
Mozilla Firefox
So I have a textbox, which contain this text (example):
facebook.com/12345678910111
facebook.com/23456789101112
facebook.com/34567891011121

After using extraction I always got this (text without line breaks):
"http://facebook.com/12345678910111http://facebook.com/23456789101112facebook.com/34567891011121"

The problem is, that I need iMacros to save text from textbox with line breaks.
I need this 'coz this textbox contains list of FaceBook IDs. After saving them in txt file, I use it as sourcefile for !DATASOURCE, so I can input into searchbox on facebook 1 ID at each loop of iMacros.

Code: Select all
TAG POS=1 TYPE=TEXTAREA ATTR=ID:text EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=FBID.txt

Code: Select all
SET !DATASOURCE "C:\\Users\\NaiLeD\\Desktop\\Banlist.txt"
SET !DATASOURCE_LINE {{!LOOP}}
TAG POS=1 TYPE=INPUT:TEXT ATTR=ID:gedit_users_search_inp CONTENT={{!COL1}}


If anyone could help me, I would appreciate.

FCIM...! :mrgreen:
=> iMacros for FF v8.9.6, FF44...?, Win7.

And " Extractioning" in your Thread Title sounds funny but you can correct it to "Extracting" for other Users searching the Forum to be able to locate this Thread...

But, OK, I had a look at your Pb, interesting and tricky Case...! :D
I used this current Forum Page and your Quotation Field in your OP as Input Field for my Tests as you didn't post the URL of your Site/Page...

The Line-Breaks are still present in an 'EXTRACT=HTM' of the Field:
Extract_TXT:
facebook.com/12345678910111facebook.com/23456789101112facebook.com/34567891011121

Extract_HTM:
<blockquote style="outline: 1px solid blue;" class="uncited"><div>facebook.com/12345678910111<br>facebook.com/23456789101112<br>facebook.com/34567891011121</div></blockquote>

I thought I had a quick Solution for you by doing a Double 'split()' on '<div>' and '</div>', which seemed to work when using a 'PROMPT' for Debug:
Code: Select all
SET Extract_with_BR EVAL("var s='{{!EXTRACT}}'; var y=s.split('<div>'); var x=y[1].split('</div>'); x[0];")

Extract_with_BR:
facebook.com/12345678910111
facebook.com/23456789101112
facebook.com/34567891011121
but nope, the Data is still saved in one Row in the 'SAVEAS':
"facebook.com/12345678910111facebook.com/23456789101112facebook.com/34567891011121"

Grrr...! Not working thus... I tried using the Clipboard as a Temp Storage for the Content before doing the 'SAVEAS', but to no avail, same Result...

It seems like the 'SAVEAS' is ignoring the '<br>' Tags, and maybe even earlier, as soon as there is any Data Manipulation in some iMacros Var (using '!EXTRACT' or any User Defined Var), the '<br>' Tags are removed.

I finally found a Solution, but as I guess your Number of FB_ID's will be Dynamic, otherwise you can just repeat the same Section of Code 3 times, and you would normally need to handle the Dynamic part and Dynamic Looping in a main '.js' Script, but I don't like and don't use '.js' Scripts myself, so I included a Dynamic "Internal Loop" completely handled in the '.iim' itself using Negative Looping and a few other Tricks.

Here is the Script I used, I've left all Debug Info in it, you will get 3 Prompts (that you can comment out at the 'PROMPT' Line...), and the Data will be saved in your 'SAVEAS' as:
"facebook.com/12345678910111"
"facebook.com/23456789101112"
"facebook.com/34567891011121"

Code: Select all
VERSION BUILD=8961227 RECORDER=FX
SET !EXTRACT_TEST_POPUP NO
TAB T=1
URL GOTO=http://forum.imacros.net/viewtopic.php?f=7&t=25822

SET !EXTRACT NULL
TAG POS=1 TYPE=BLOCKQUOTE ATTR=TXT:facebook.com/12345678910111facebook.com/23456789101112facebo* EXTRACT=TXT
SET Extract_TXT {{!EXTRACT}}
'=> Extracted:
'=> facebook.com/12345678910111facebook.com/23456789101112facebook.com/34567891011121

SET !EXTRACT NULL
TAG POS=1 TYPE=BLOCKQUOTE ATTR=TXT:facebook.com/12345678910111facebook.com/23456789101112facebo* EXTRACT=HTM
SET Extract_HTM {{!EXTRACT}}
'=> Extracted:
'=> <blockquote style="outline: 1px solid blue;" class="uncited"><div>facebook.com/12345678910111<br>facebook.com/23456789101112<br>facebook.com/34567891011121</div></blockquote>

'Hum, working in 'PROMPT' but not in 'SAVEAS':
SET Extract_with_BR EVAL("var s='{{!EXTRACT}}'; var y=s.split('<div>'); var x=y[1].split('</div>'); x[0];")

'Not working:
'SET Nb_IDs EVAL("var s='{{Extract_with_BR}}'; var y=s.split('<br>'); var x=y.length; x;")
'SET FB_IDn EVAL("var s='{{Extract_with_BR}}'; var y=s.split('<br>'); y[0];")
SET Nb_IDs EVAL("var s='{{!EXTRACT}}'; var y=s.split('<div>'); var x=y[1].split('</div>'); var w=x[0].split('<br>'); var z=w.length; z;")

'PROMPT Extract_TXT:<BR>{{Extract_TXT}}<BR><BR>Extract_HTM:<BR>{{Extract_HTM}}<BR><BR>Extract_with_BR:<BR>{{Extract_with_BR}}<BR><BR>Nb_IDs:<SP>{{Nb_IDs}}
'PAUSE

'Not working in 'SAVEAS', directly or with Clipboard:
'SET !CLIPBOARD {{Extract_with_BR}}
'SET !EXTRACT {{!CLIPBOARD}}
'SAVEAS TYPE=EXTRACT FOLDER=* FILE=FBID.txt

'Using '!LOOP' (with Negative Numbers) to loop through the FB_ID's + 1 'SAVEAS' per FB_ID:
'Computing '!LOOP' (only used in frist Run):
SET Internal_Loop 2
ADD Internal_Loop -{{Nb_IDs}}
SET !LOOP {{Internal_Loop}}
'>
SET FB_ID_Loop {{Nb_IDs}}
ADD FB_ID_Loop {{!LOOP}}
ADD FB_ID_Loop -2
SET FB_IDn EVAL("var n='{{FB_ID_Loop}}'; var s='{{!EXTRACT}}'; var y=s.split('<div>'); var x=y[1].split('</div>'); var z=x[0].split('<br>'); z[n];")

SET Debug_Info Extract_TXT:<BR>{{Extract_TXT}}<BR><BR>Extract_HTM:<BR>{{Extract_HTM}}<BR><BR>Extract_with_BR:<BR>{{Extract_with_BR}}
ADD Debug_Info <BR><BR>Nb_IDs:<SP>{{Nb_IDs}}<BR>Internal_Loop<SP>(=2-Nb_IDs):<SP>{{Internal_Loop}}<BR>LOOP:<SP>{{!LOOP}}
ADD Debug_Info <BR><BR>FB_ID_Loop<SP>('split(n)'):<SP>{{FB_ID_Loop}}<BR>FB_IDn:<SP>{{FB_IDn}}
PROMPT {{Debug_Info}}
'PAUSE

SET !EXTRACT {{FB_IDn}}
SAVEAS TYPE=EXTRACT FOLDER=* FILE=FBID.txt
(Tested on iMacros for FF v8.9.6, FF44, Win10-x64.)

EDIT: But hum, the first Method (directly or with Clipboard) actually works, the '<br>' Tags are saved in the 'SAVEAS', the Pb seems to be that Notepad does not see/interpret them correctly, but it works if you open the .CSV in WordPad instead of Notepad. And you can then do a quick Copy&Paste from WordPad to another Notepad (New) File (and remove as well the containing Double Quotes)...:
"facebook.com/12345678910111
facebook.com/23456789101112
facebook.com/34567891011121"


But that still doesn't explain why this Line was not working:
Code: Select all
'SET Nb_IDs EVAL("var s='{{Extract_with_BR}}'; var y=s.split('<br>'); var x=y.length; x;")
('EVAL()' and 'split()' were not seeing the '<br>' Tags anymore...)
But OK, you've got a working Solution anyway, enjoy...!
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6473
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extracting text from one textbox into separated lines.

by NaiLeD on Tue Feb 23, 2016 10:21 pm

chivracq wrote:The Line-Breaks are still present in an 'EXTRACT=HTM' of the Field

Ok, the problem is that this textfield located in frame (without number like frame F=0, have only dynamic name. I decided to use frame name=f*, 'coz name always starts with "f" and it is the only frame on page), which directs to SFW application, that gives me an IDs. Everytime it creates blank textarea and paste into it IDs. Thats why I got this after HTM extract:
"<textarea id=""text"" style=""outline: 1px solid blue;""></textarea>"

But, I can get a list of IDs without "facebook.com/" and they looks like a list separated with comas:
12345678910111,23456789101112,34567891011121
so, as I suppose, I use the last code you created and change "<br>" to ",". I have only question: what i supposed to use instead of "<div>" and "</div>"?
Code: Select all
SET Internal_Loop 2
ADD Internal_Loop -{{Nb_IDs}}
SET !LOOP {{Internal_Loop}}

SET FB_ID_Loop {{Nb_IDs}}
ADD FB_ID_Loop {{!LOOP}}
ADD FB_ID_Loop -2
SET FB_IDn EVAL("var n='{{FB_ID_Loop}}'; var s='{{!EXTRACT}}'; var y=s.split('<div>'); var x=y[1].split('</div>'); var z=x[0].split('<br>'); z[n];")

SET !EXTRACT {{FB_IDn}}
SAVEAS TYPE=EXTRACT FOLDER=* FILE=FBID.txt


P.S. Thank you very much for reply.
P.P.S. I'm from Ukraine, and English isn't my main language, thats why I've made so many mistakes. But I'm trying to improve my English.
NaiLeD
 
Posts: 2
Joined: Mon Feb 22, 2016 1:53 pm


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 2 guests

-->