Extracting a table when two columns share same element name

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Extracting a table when two columns share same element name

by tennisdude on Thu Mar 30, 2017 9:52 am

Hi,

I'm trying to extract a table with 7 columns of data.

In the table I'm trying to extract two columns share the same element name "CLASS:player-name" I have to apply a regular expression to that element name to remove additional data I do not need.

When I try to create a loop on the POS of the individual element when The script loops it uses the same player name for both columns in the csv file.

I've tried to get the script to use relative extraction and have had no luck.

Could some of the imacros wizards help me or point me in the right direction??

Cheers
TD


Code: Select all

VERSION BUILD=10022823
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=https://matchstat.com/tennis/all-upcoming-matches

TAG POS=3 TYPE=TD ATTR=CLASS:round EXTRACT=TXT
TAG POS=3 TYPE=TD ATTR=CLASS:event-name EXTRACT=TXT

TAG POS=5 TYPE=TD ATTR=CLASS:player-name EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.replace(/ \\(.+/gm, '');")

TAG POS=6 TYPE=TD ATTR=CLASS:player-name EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.replace(/ \\(.+/gm, '');")

TAG POS=3 TYPE=TD ATTR=CLASS:odds-td<SP>odds-0 EXTRACT=TXT
TAG POS=3 TYPE=TD ATTR=CLASS:odds-td<SP>odds-1 EXTRACT=TXT
TAG POS=3 TYPE=TD ATTR=CLASS:h2h EXTRACT=TXT

SAVEAS TYPE=EXTRACT FOLDER=* FILE=T1.csv

tennisdude
 
Posts: 7
Joined: Mon Jan 30, 2017 2:48 pm

Re: Extracting a table when two columns share same element n

by chivracq on Thu Mar 30, 2017 6:05 pm

tennisdude wrote:Hi,

I'm trying to extract a table with 7 columns of data.

In the table I'm trying to extract two columns share the same element name "CLASS:player-name" I have to apply a regular expression to that element name to remove additional data I do not need.

When I try to create a loop on the POS of the individual element when The script loops it uses the same player name for both columns in the csv file.

I've tried to get the script to use relative extraction and have had no luck.

Could some of the imacros wizards help me or point me in the right direction??

Cheers
TD

Code: Select all
VERSION BUILD=10022823
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=https://matchstat.com/tennis/all-upcoming-matches

TAG POS=3 TYPE=TD ATTR=CLASS:round EXTRACT=TXT
TAG POS=3 TYPE=TD ATTR=CLASS:event-name EXTRACT=TXT

TAG POS=5 TYPE=TD ATTR=CLASS:player-name EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.replace(/ \\(.+/gm, '');")

TAG POS=6 TYPE=TD ATTR=CLASS:player-name EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.replace(/ \\(.+/gm, '');")

TAG POS=3 TYPE=TD ATTR=CLASS:odds-td<SP>odds-0 EXTRACT=TXT
TAG POS=3 TYPE=TD ATTR=CLASS:odds-td<SP>odds-1 EXTRACT=TXT
TAG POS=3 TYPE=TD ATTR=CLASS:h2h EXTRACT=TXT

SAVEAS TYPE=EXTRACT FOLDER=* FILE=T1.csv


FCIM...! :mrgreen: (Always mention your FCI when you open a Thread, read my Sig..., that's mostly why I never reacted to any of your 2 previous Threads, even if one mentioned IE, many Commands are not implemented for all Browsers/Versions or behave differently...)
=> FCI:
Code: Select all
iMacros v10.0.2, iMB/IE...?, if IE v...?, OS...?

(If you could mention that Info + for your 2 previous Threads as well (where you could post some Update btw, either you still have the Pb or you managed to solve those 2 Threads and you are expected to share your Sol), I won't follow up otherwise... (and I won't ask again in some future Thread(s)...?)
I normally don't even read Threads when FCI is not clearly mentioned (preferably at the top of the Opening Post in a Thread)... :idea:

But OK, I had a look at your Case as you luckily provided Script + URL, and hum..., not Difficulty at all to use Relative Positioning, I don't know what you tried, but there is no Glitch at all for R-Positioning for the 2nd Player based on the 1st Player or even for the whole Row based on the first Element (the "SF"/"QF"/etc...) like for example:
Code: Select all
VERSION BUILD=8820413 RECORDER=FX
'SET !EXTRACT_TEST_POPUP NO
TAB T=1

'URL GOTO=https://matchstat.com/tennis/all-upcoming-matches

TAG POS=5 TYPE=TD ATTR=CLASS:round EXTRACT=TXT
TAG POS=R1 TYPE=TD ATTR=CLASS:event-name EXTRACT=TXT

TAG POS=R1 TYPE=TD ATTR=CLASS:player-name EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.replace(/ \\(.+/gm, '');")
PROMPT _{{!EXTRACT}}_

TAG POS=R1 TYPE=TD ATTR=CLASS:player-name EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.replace(/ \\(.+/gm, '');")
PROMPT _{{!EXTRACT}}_

TAG POS=R1 TYPE=TD ATTR=CLASS:odds-td<SP>odds-0 EXTRACT=TXT
TAG POS=R1 TYPE=TD ATTR=CLASS:odds-td<SP>odds-1 EXTRACT=TXT
TAG POS=R1 TYPE=TD ATTR=CLASS:h2h EXTRACT=TXT
PROMPT _{{!EXTRACT}}_

PAUSE
SAVEAS TYPE=EXTRACT FOLDER=* FILE=T1.csv
(Tested on iMacros for FF v8.8.2, Pale Moon v26.3.3 (=FF47), Win10-x64.)

If you want to loop your Script, you only need to "play" with '!LOOP' for the first ('round') Element which seems to follow 'TAG POS=3/5/7/9/etc'... =>:
Code: Select all
SET !LOOP 1
SET My_Loop {{!LOOP}
ADD My_Loop {{!LOOP}}
ADD My_Loop 1

TAG POS={{My_Loop}} TYPE=TD ATTR=CLASS:round EXTRACT=TXT
TAG POS=R1 TYPE=TD ATTR=CLASS:event-name EXTRACT=TXT

'/etc... all other Extracts with 'R1' as well...


Hum..., and all your Data Manipulation directly on '!EXTRACT' and especially on the complete '!EXTRACT' to clean up a bit the 2 Names is not Best Practice in my Opinion and could be "dangerous" and unreliable, it's better to use Temp_Vars for each Extract and to "reconstruct" the '!EXTRACT' before doing the 'SAVEAS'...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6479
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extracting a table when two columns share same element n

by tennisdude on Wed Jul 05, 2017 12:05 am

Hi Chivracq,

Sorry about the long awaited delay in getting back to you.
I had given up with this project until I had success with other scripts I decided to give this project a try again.

I'm using version 9.03 for FF running on 64 bit windows 7.

I've edited the script like you suggested and I cannot get the regular expression to run on EXTRACT and it seems that
some of the player-name extracts are duplicating in one column. It only happens some so its really strange behavior.

does anyone have any suggestions in how I can make this code get the desired output?

Here's my code.
Code: Select all

VERSION BUILD=9030808 RECORDER=FX
TAB T=1
TAB CLOSEALLOTHERS

URL GOTO=https://matchstat.com/tennis/all-upcoming-matches

SET {{!LOOP}} 1
SET My_Loop {{!LOOP}}
ADD My_Loop {{!LOOP}}

TAG POS={{My_Loop}} TYPE=TD ATTR=CLASS:round EXTRACT=TXT
SET Round !EXTRACT
 

TAG POS=R1 TYPE=TD ATTR=CLASS:event-name EXTRACT=TXT
SET Event_name !EXTRACT


TAG POS=R1 TYPE=TD ATTR=CLASS:player-name EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.replace(/ \\(.+/gm, '');")
SET Player_Name1 !EXTRACT


TAG POS=R1 TYPE=TD ATTR=CLASS:player-name EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.replace(/ \\(.+/gm, '');")
SET Player_Name2 !EXTRACT
 

TAG POS=R1 TYPE=TD ATTR=CLASS:odds-td<SP>odds-0 EXTRACT=TXT
SET Odds1 !EXTRACT
 

TAG POS=R1 TYPE=TD ATTR=CLASS:odds-td<SP>odds-1 EXTRACT=TXT
SET Odds2 !EXTRACT
 

TAG POS=R1 TYPE=TD ATTR=CLASS:h2h EXTRACT=TXT
SET H2H !EXTRACT



ADD Round !EXTRACT
ADD Event_name !EXTRACT
ADD Player_Name1 !EXTRACT
ADD Player_Name2 !EXTRACT
ADD Odds1 !EXTRACT
ADD Odds2 !EXTRACT
ADD H2H !EXTRACT

SAVEAS TYPE=EXTRACT FOLDER=* FILE=T1.csv











the output.


R64 Wimbledon P Lorenzi / A Mannarino(ITA/FRA) S Clayton / J O'Mara(GBR/GBR) 2.4 1.55
R64 Wimbledon T Bellucci / R Dutra Silva(BRA/BRA) F Martin / D Nestor(FRA/CAN)F Martin / D Nestor(FRA/CAN)13 6 1.12
R64 Wimbledon J Brunstrom / A Siljestrom(SWE/SWE) V Troicki / N Zimonjic(SRB/SRB)
R64 Wimbledon J Erlich / T Huey(ISR/PHL) I Dodig / M Granollers(CRO/ESP)I Dodig / M Granollers(CRO/ESP)6 3.4 1.3
R64 Wimbledon P Petzschner / A Peya(GER/AUT) R Haase / D Inglot(NLD/) 2 1.77
R64 Wimbledon D Marcan / T Weissborn(CRO/AUT) F Mergea / A Qureshi(ROU/PAK)F Mergea / A Qureshi(ROU/PAK)14 6.05 1.12
R64 Wimbledon A Behar / A Bury(URU/BLR) M Daniell / M Demoliner(NZL/BRA)
R64 Wimbledon D Brown / M Zverev(GER/GER) R Bopanna / E Roger-Vasselin(IND/FRA)R Bopanna / E Roger-Vasselin(IND/FRA)8 2.8 1.42
R64 Wimbledon P Raja / D Sharan(IND/IND) K Edmund / J Sousa(GBR/PRT) 1.33 3.2
R64 Wimbledon C Berlocq / A Ramos-Vinolas(ARG/ESP) J Cabal / R Farah(COL/COL)J Cabal / R Farah(COL/COL)12
R64 Wimbledon G Muller / S Querrey(LUX/USA) N Mektic / F Skugor(CRO/CRO) 1.4 2.9
R64 Wimbledon R Jebavy / J Vesely(CZE/CZE) J Murray / B Soares(/BRA)J Murray / B Soares(/BRA)3 8 1.07
R64 Wimbledon M Matkowski / M Mirnyi(POL/BLR) C Hsieh / M Schnur(/USA)
R64 Wimbledon M Reid / J Smith(AUS/AUS) F Lopez / M Lopez(ESP/ESP)F Lopez / M Lopez(ESP/ESP)11
R64 Wimbledon M Baghdatis / M Jaziri(CYP/TUN) S Darcis / B Paire(BEL/) 1.75 2.05
R64 Wimbledon S Gonzalez / D Young(MEX/USA) P Herbert / N Mahut(FRA/FRA)P Herbert / N Mahut(FRA/FRA)2
R64 Wimbledon Jiri Vesely(CZE) Fabio Fognini(ITA)Fabio Fognini(ITA)28 2.1 1.72 H2H 0-2
R64 Wimbledon Jerzy Janowicz(POL) Lucas Pouille(FRA)Lucas Pouille(FRA)14 3 1.4 H2H 0-1
R64 Wimbledon Nikoloz Basilashvili(GEO) Sam Querrey(USA)Sam Querrey(USA)24 5 1.16 H2H 0-0
R64 Wimbledon Ruben Bemelmans(BEL) Daniil Medvedev(RUS) 5.5 1.14 H2H 0-0
tennisdude
 
Posts: 7
Joined: Mon Jan 30, 2017 2:48 pm

Re: Extracting a table when two columns share same element n

by chivracq on Thu Jul 06, 2017 7:27 am

tennisdude wrote:Hi Chivracq,

Sorry about the long awaited delay in getting back to you.
I had given up with this project until I had success with other scripts I decided to give this project a try again.

I'm using
Code: Select all
version 9.03 for FF running on 64 bit windows 7.


I've edited the script like you suggested and I cannot get the regular expression to run on EXTRACT and it seems that
some of the player-name extracts are duplicating in one column. It only happens some so its really strange behavior.

does anyone have any suggestions in how I can make this code get the desired output?

Here's my code.
Code: Select all

VERSION BUILD=9030808 RECORDER=FX
TAB T=1
TAB CLOSEALLOTHERS

URL GOTO=https://matchstat.com/tennis/all-upcoming-matches

SET {{!LOOP}} 1
SET My_Loop {{!LOOP}}
ADD My_Loop {{!LOOP}}

TAG POS={{My_Loop}} TYPE=TD ATTR=CLASS:round EXTRACT=TXT
SET Round !EXTRACT
 

TAG POS=R1 TYPE=TD ATTR=CLASS:event-name EXTRACT=TXT
SET Event_name !EXTRACT


TAG POS=R1 TYPE=TD ATTR=CLASS:player-name EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.replace(/ \\(.+/gm, '');")
SET Player_Name1 !EXTRACT


TAG POS=R1 TYPE=TD ATTR=CLASS:player-name EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.replace(/ \\(.+/gm, '');")
SET Player_Name2 !EXTRACT
 

TAG POS=R1 TYPE=TD ATTR=CLASS:odds-td<SP>odds-0 EXTRACT=TXT
SET Odds1 !EXTRACT
 

TAG POS=R1 TYPE=TD ATTR=CLASS:odds-td<SP>odds-1 EXTRACT=TXT
SET Odds2 !EXTRACT
 

TAG POS=R1 TYPE=TD ATTR=CLASS:h2h EXTRACT=TXT
SET H2H !EXTRACT



ADD Round !EXTRACT
ADD Event_name !EXTRACT
ADD Player_Name1 !EXTRACT
ADD Player_Name2 !EXTRACT
ADD Odds1 !EXTRACT
ADD Odds2 !EXTRACT
ADD H2H !EXTRACT

SAVEAS TYPE=EXTRACT FOLDER=* FILE=T1.csv



the output.
Code: Select all
R64   Wimbledon   P Lorenzi / A Mannarino(ITA/FRA)   S Clayton / J O'Mara(GBR/GBR)   2.4   1.55   
R64   Wimbledon   T Bellucci / R Dutra Silva(BRA/BRA)   F Martin / D Nestor(FRA/CAN)F Martin / D Nestor(FRA/CAN)13   6   1.12   
R64   Wimbledon   J Brunstrom / A Siljestrom(SWE/SWE)   V Troicki / N Zimonjic(SRB/SRB)         
R64   Wimbledon   J Erlich / T Huey(ISR/PHL)   I Dodig / M Granollers(CRO/ESP)I Dodig / M Granollers(CRO/ESP)6   3.4   1.3   
R64   Wimbledon   P Petzschner / A Peya(GER/AUT)   R Haase / D Inglot(NLD/)   2   1.77   
R64   Wimbledon   D Marcan / T Weissborn(CRO/AUT)   F Mergea / A Qureshi(ROU/PAK)F Mergea / A Qureshi(ROU/PAK)14   6.05   1.12   
R64   Wimbledon   A Behar / A Bury(URU/BLR)   M Daniell / M Demoliner(NZL/BRA)         
R64   Wimbledon   D Brown / M Zverev(GER/GER)   R Bopanna / E Roger-Vasselin(IND/FRA)R Bopanna / E Roger-Vasselin(IND/FRA)8   2.8   1.42   
R64   Wimbledon   P Raja / D Sharan(IND/IND)   K Edmund / J Sousa(GBR/PRT)   1.33   3.2   
R64   Wimbledon   C Berlocq / A Ramos-Vinolas(ARG/ESP)   J Cabal / R Farah(COL/COL)J Cabal / R Farah(COL/COL)12         
R64   Wimbledon   G Muller / S Querrey(LUX/USA)   N Mektic / F Skugor(CRO/CRO)   1.4   2.9   
R64   Wimbledon   R Jebavy / J Vesely(CZE/CZE)   J Murray / B Soares(/BRA)J Murray / B Soares(/BRA)3   8   1.07   
R64   Wimbledon   M Matkowski / M Mirnyi(POL/BLR)   C Hsieh / M Schnur(/USA)         
R64   Wimbledon   M Reid / J Smith(AUS/AUS)   F Lopez / M Lopez(ESP/ESP)F Lopez / M Lopez(ESP/ESP)11         
R64   Wimbledon   M Baghdatis / M Jaziri(CYP/TUN)   S Darcis / B Paire(BEL/)   1.75   2.05   
R64   Wimbledon   S Gonzalez / D Young(MEX/USA)   P Herbert / N Mahut(FRA/FRA)P Herbert / N Mahut(FRA/FRA)2         
R64   Wimbledon   Jiri Vesely(CZE)   Fabio Fognini(ITA)Fabio Fognini(ITA)28   2.1   1.72   H2H 0-2
R64   Wimbledon   Jerzy Janowicz(POL)   Lucas Pouille(FRA)Lucas Pouille(FRA)14   3   1.4   H2H 0-1
R64   Wimbledon   Nikoloz Basilashvili(GEO)   Sam Querrey(USA)Sam Querrey(USA)24   5   1.16   H2H 0-0
R64   Wimbledon   Ruben Bemelmans(BEL)   Daniil Medvedev(RUS)   5.5   1.14   H2H 0-0

Hum..., 3 months later indeed, never too late I guess... 8)

OK, so you are now on FF, your Thread started about iMB/IE, OK, and we finally have your FCI, even if your FF Version is still missing..., I guess it's too complicated to mention 3 Versions...!? :shock:
=> I reckon you'll probably be on FF54 + iMacros for FF v9.0.3 + Win7-x64.

Oh...!, but you missed the following part in my previous Reply...:
(If you could mention that Info + for your 2 previous Threads as well (where you could post some Update btw, either you still have the Pb or you managed to solve those 2 Threads and you are expected to share your Sol), I won't follow up otherwise... (and I won't ask again in some future Thread(s)...?)
I normally don't even read Threads when FCI is not clearly mentioned (preferably at the top of the Opening Post in a Thread)... :idea:

=> Sorry, but I don't follow up... :roll:
You need to use the Forum "a bit correctly" if you want me to help you..., and that applies to all your Threads...:
- Thread 1: Cannot connect VB script to Access .accdb
- Thread 2: Re: Can't connect to Database

But hum, for this one, from a quick look at your Script, you could read a bit of Documentation (+ reading a few Pages of Forum Threads is a good Practice as well, if you want to understand a little bit how iMacros works... + look at the Demo-Macros...), you haven't understood how to use Variables with iMacros, you have a complete Wiki Page dedicated to Variables... (and '!EXTRACT' is a Var as well btw...) + use 'PROMPT' to follow/debug your Vars... :idea:
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6479
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extracting a table when two columns share same element n

by tennisdude on Thu Jul 06, 2017 8:40 am

Hi Chivracq,


Yes the versions you mentioned are correct. I'll do my best to use a post template for future threads to benefit the forum.
I've used the prompt extract and it stores the correct elements I want so I'm not sure why in this case few rows down it fails.
Is there another tag I could use in order to be more direct with the column of data I want?
This script is needed to trigger another script to gather stats after it collects the daily matchups.
I have the other script working but I need this one to work first.

Once I do get this script working I'll be able to concentrate on the other scripts for the DB.


Thanks,
t
tennisdude
 
Posts: 7
Joined: Mon Jan 30, 2017 2:48 pm

Re: Extracting a table when two columns share same element n

by tennisdude on Fri Jul 07, 2017 2:40 pm

I'm a little stuck at the moment.

Can anyone point me in the right direction?
tennisdude
 
Posts: 7
Joined: Mon Jan 30, 2017 2:48 pm


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 1 guest

-->