Site now uses infinite scrolling - what to do?

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Site now uses infinite scrolling - what to do?

by mfletcher on Wed Sep 20, 2017 8:40 am

I'm sorry, I'm very new to this. A friend wrote this a long time ago. I reckon I have to tell `extractFromLibrary` to use some scroll mechanism instead of parsing through pages? Thanks!

Code: Select all
// Instead of just extracting all the books from one library, this script imports a list of library's from CSV
// and for each, saves all of their books to the output CSV

// To speed up the running of this script, the following iMacros preferences are recommended...
// Go to iMacros options, then the "general" tab
// Set "Replay Speed" to fast
// Under "Visual Effects", untick "scroll to object when found" as well as "Highlight object when found"
// Under "Javascript scripting settings", untick "Show Javascript during replay"

// File is read from the "datasources" path set in iMacros prefs, not "downloads" path pref
const inputFileName= "Libraries-to-extract-from.csv";
// The starting row number (to enable importing just part of a large CSV)
const startRowID = 1;
// Name of the file where the results are output (is saved into the "Downloads" folder set in iMacros prefs)
// NB: Every time this script is run, the results are just added to the end of this file.
// So delete/rename the output file if needed - to avoid duplicate entries.
const outputFileName= "Books-url-list.csv";

// ###################

// Global variable for status message, since using iimDisplay() clears previous messages
var statusMessage;

addStatusMessage("Importing " + inputFileName + ", starting at line " + startRowID);

// For each library in the CSV file, import all of their books
var rowID = 1;
while (true) {
    // Not using addStatusMessage() directly, since want to throw away this last message afterwards
    iimDisplay(statusMessage + "\n-Processing row " + rowID);
    var currentRowContents = getCSVRow(rowID);
    if (!currentRowContents) {
        // Break if end of file reached, or if there was an error reading the file (eg file not found)
        addStatusMessage("Exiting on row " + rowID + ". Either an error has occurred, or the end of file was reached.");
        break;
    }
    extractFromLibrary(currentRowContents);
    rowID++;
}

function extractFromLibrary(targetLibrary) {
    // URL of the books page to process
    var targetBooksPage = targetLibrary + "/books";
    goToPage(targetBooksPage);

    var lastPageID = getLastPageID();
    addStatusMessage("Saving pages 1->" + lastPageID + " for " + targetBooksPage);

    for (var i = startFromPageID; i <= lastPageID; i++) {
        // Not using addStatusMessage() directly, since want to throw away this last message afterwards
        iimDisplay(statusMessage + "\n-Processing page " + i);
        // Start of script navigated to page 1 already, so only need to change if i is not 1
        if (i != 1) goToPage(targetBooksPage + "?page=" + i);
        processCurrentPage();
    }
}


/* Helper Functions */

function runMacro(macro) {
    // Runs the specified macro with a reduced tag timeout of 3 seconds (default is 60)
    return iimPlay("CODE:" + "SET !TIMEOUT_TAG 3\n" + macro);
}
function addStatusMessage(newMessage) {
    // Using iimDisplay() clears previous messages, so global statusMessage variable used to save them
    if (!statusMessage) {
        statusMessage = "Starting script...";
    }
    statusMessage += "\n-" + newMessage;
    iimDisplay(statusMessage);
}
function getCSVRow(rowID) {
    var result = runMacro("SET !DATASOURCE " + inputFileName +
    "\nSET !DATASOURCE_COLUMNS 1" +
    "\nSET !DATASOURCE_LINE " + rowID +
    "\nSET !EXTRACT {{!COL1}}");
    if (result < 0) {
        // Fetching the row failed. Could be due to end of file or else file not found.
        return null;
    } else {
        return iimGetLastExtract(1);
    }
}
function goToPage(url) {
    // Navigates to the desired URL with images turned off, to decrease pageload time
    runMacro("FILTER TYPE=IMAGES STATUS=ON" +
            "\nURL GOTO=" + url);
}
function getLastPageID() {
    // Extract the page ID of the last page of books, using relative positioning numbering
    // The site uses "Page 1", "Page 2", "...", "Page N", "Next" type site navigation
    // First finds the "Next" link, than extracts the link text immediately prior to it, to get last page ID
    runMacro("TAG POS=1 TYPE=A ATTR=TXT:Next EXTRACT=TXT" +
            "\nTAG POS=R-1 TYPE=A ATTR=TXT:* EXTRACT=TXT");
    if (iimGetLastExtract(2) == "#EANF#") {
        // Tags not found, or timeout reached
        addStatusMessage("No next page button found, so there must only be one page total" +
                " (or else the page didn't finish loading in 60s).");
        lastPageID = 1;
    } else {
        // Tags found, so use the link text value
        lastPageID = iimGetLastExtract(2);
    }
    return lastPageID;
}
function processCurrentPage() {
    var i = 0;
    while (true) {
        i++;
        // Attempt extraction of next library book link
        // Note: extraction and saving to CSV were not combined, since hard/impossible to know when to stop,
        // since logic not possible inside macros - and whenever SAVEAS TYPE=EXTRACT is used, the
        // EXTRACT variable is cleared. So iimGetLastExtract(1) always returns null, regardless of success or
        // failure. Even if the iimPlay return code was checked instead, #EANF# junk would still have been added
        // to the last row of the CSV, which isn't desired.
        // To reduce the slowdown caused by splitting the steps, the EXTRACT variable is manually set before
        // using SAVEAS, rather than wasting time using TAG again.
        runMacro("TAG POS=" + i + " TYPE=A ATTR=CLASS:library-link&&TITLE: EXTRACT=HREF");
        var currentLibraryURL = iimGetLastExtract(1);
        // If that link was found, save to the next line of the CSV, otherwise break out of loop
        if (currentLibraryURL == "#EANF#") {
            break;
        } else {
            runMacro("SET !EXTRACT " + currentLibraryURL +
                    "\nSAVEAS TYPE=EXTRACT FOLDER=* FILE=" + outputFileName);
        }
    }
}
mfletcher
 
Posts: 8
Joined: Wed Sep 20, 2017 8:22 am

Re: Site now uses infinite scrolling - what to do?

by chivracq on Wed Sep 20, 2017 8:56 am

mfletcher wrote:I'm sorry, I'm very new to this. A friend wrote this a long time ago. I reckon I have to tell `extractFromLibrary` to use some scroll mechanism instead of parsing through pages? Thanks!

Code: Select all
// Instead of just extracting all the books from one library, this script imports a list of library's from CSV
// and for each, saves all of their books to the output CSV

// To speed up the running of this script, the following iMacros preferences are recommended...
// Go to iMacros options, then the "general" tab
// Set "Replay Speed" to fast
// Under "Visual Effects", untick "scroll to object when found" as well as "Highlight object when found"
// Under "Javascript scripting settings", untick "Show Javascript during replay"

// File is read from the "datasources" path set in iMacros prefs, not "downloads" path pref
const inputFileName= "Libraries-to-extract-from.csv";
// The starting row number (to enable importing just part of a large CSV)
const startRowID = 1;
// Name of the file where the results are output (is saved into the "Downloads" folder set in iMacros prefs)
// NB: Every time this script is run, the results are just added to the end of this file.
// So delete/rename the output file if needed - to avoid duplicate entries.
const outputFileName= "Books-url-list.csv";

// ###################

// Global variable for status message, since using iimDisplay() clears previous messages
var statusMessage;

addStatusMessage("Importing " + inputFileName + ", starting at line " + startRowID);

// For each library in the CSV file, import all of their books
var rowID = 1;
while (true) {
    // Not using addStatusMessage() directly, since want to throw away this last message afterwards
    iimDisplay(statusMessage + "\n-Processing row " + rowID);
    var currentRowContents = getCSVRow(rowID);
    if (!currentRowContents) {
        // Break if end of file reached, or if there was an error reading the file (eg file not found)
        addStatusMessage("Exiting on row " + rowID + ". Either an error has occurred, or the end of file was reached.");
        break;
    }
    extractFromLibrary(currentRowContents);
    rowID++;
}

function extractFromLibrary(targetLibrary) {
    // URL of the books page to process
    var targetBooksPage = targetLibrary + "/books";
    goToPage(targetBooksPage);

    var lastPageID = getLastPageID();
    addStatusMessage("Saving pages 1->" + lastPageID + " for " + targetBooksPage);

    for (var i = startFromPageID; i <= lastPageID; i++) {
        // Not using addStatusMessage() directly, since want to throw away this last message afterwards
        iimDisplay(statusMessage + "\n-Processing page " + i);
        // Start of script navigated to page 1 already, so only need to change if i is not 1
        if (i != 1) goToPage(targetBooksPage + "?page=" + i);
        processCurrentPage();
    }
}


/* Helper Functions */

function runMacro(macro) {
    // Runs the specified macro with a reduced tag timeout of 3 seconds (default is 60)
    return iimPlay("CODE:" + "SET !TIMEOUT_TAG 3\n" + macro);
}
function addStatusMessage(newMessage) {
    // Using iimDisplay() clears previous messages, so global statusMessage variable used to save them
    if (!statusMessage) {
        statusMessage = "Starting script...";
    }
    statusMessage += "\n-" + newMessage;
    iimDisplay(statusMessage);
}
function getCSVRow(rowID) {
    var result = runMacro("SET !DATASOURCE " + inputFileName +
    "\nSET !DATASOURCE_COLUMNS 1" +
    "\nSET !DATASOURCE_LINE " + rowID +
    "\nSET !EXTRACT {{!COL1}}");
    if (result < 0) {
        // Fetching the row failed. Could be due to end of file or else file not found.
        return null;
    } else {
        return iimGetLastExtract(1);
    }
}
function goToPage(url) {
    // Navigates to the desired URL with images turned off, to decrease pageload time
    runMacro("FILTER TYPE=IMAGES STATUS=ON" +
            "\nURL GOTO=" + url);
}
function getLastPageID() {
    // Extract the page ID of the last page of books, using relative positioning numbering
    // The site uses "Page 1", "Page 2", "...", "Page N", "Next" type site navigation
    // First finds the "Next" link, than extracts the link text immediately prior to it, to get last page ID
    runMacro("TAG POS=1 TYPE=A ATTR=TXT:Next EXTRACT=TXT" +
            "\nTAG POS=R-1 TYPE=A ATTR=TXT:* EXTRACT=TXT");
    if (iimGetLastExtract(2) == "#EANF#") {
        // Tags not found, or timeout reached
        addStatusMessage("No next page button found, so there must only be one page total" +
                " (or else the page didn't finish loading in 60s).");
        lastPageID = 1;
    } else {
        // Tags found, so use the link text value
        lastPageID = iimGetLastExtract(2);
    }
    return lastPageID;
}
function processCurrentPage() {
    var i = 0;
    while (true) {
        i++;
        // Attempt extraction of next library book link
        // Note: extraction and saving to CSV were not combined, since hard/impossible to know when to stop,
        // since logic not possible inside macros - and whenever SAVEAS TYPE=EXTRACT is used, the
        // EXTRACT variable is cleared. So iimGetLastExtract(1) always returns null, regardless of success or
        // failure. Even if the iimPlay return code was checked instead, #EANF# junk would still have been added
        // to the last row of the CSV, which isn't desired.
        // To reduce the slowdown caused by splitting the steps, the EXTRACT variable is manually set before
        // using SAVEAS, rather than wasting time using TAG again.
        runMacro("TAG POS=" + i + " TYPE=A ATTR=CLASS:library-link&&TITLE: EXTRACT=HREF");
        var currentLibraryURL = iimGetLastExtract(1);
        // If that link was found, save to the next line of the CSV, otherwise break out of loop
        if (currentLibraryURL == "#EANF#") {
            break;
        } else {
            runMacro("SET !EXTRACT " + currentLibraryURL +
                    "\nSAVEAS TYPE=EXTRACT FOLDER=* FILE=" + outputFileName);
        }
    }
}

FCIM...! :mrgreen: (Read my Sig...)

But Compliments already to your Friend, the Script is nicely written, by probably a Professional Programmer used to work in a Team with other Programmers working on the same Project/Code... We don't see that "Quality" very often on the Forum, ah-ah...! :oops:
Script is a bit old indeed, maybe from 5 or 6 years ago as I see some deprecated Command(s), ah-ah...!
But OK, mention your FCI for me to "elaborate"... :idea:
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 7062
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Site now uses infinite scrolling - what to do?

by mfletcher on Wed Sep 20, 2017 9:44 am

Yeah, he's very good isn't he :)

I'm on Firefox 55.0.3 with iMacros for Firefox 9.0.3 on Windows 10.
mfletcher
 
Posts: 8
Joined: Wed Sep 20, 2017 8:22 am

Re: Site now uses infinite scrolling - what to do?

by chivracq on Wed Sep 20, 2017 11:29 am

mfletcher wrote:Yeah, he's very good isn't he :)

I'm on
Code: Select all
Firefox 55.0.3 with iMacros for Firefox 9.0.3 on Windows 10.

OK, FCI mentioned, perfect, now we can talk...! :D
(Always mention your FCI when you open a Thread (or post for the first time in some existing Thread), I don't react otherwise... :idea: , and many Commands are not implemented for all Browsers/Versions...)

Answering first your original Qt, the Scrolling is usually achieved using the following Statement/Syntax (from an '.iim' (on-the-fly or native) Macro):
Code: Select all
URL GOTO=javascript:window.scrollBy(0,500)

But you are on iMacros for FF v9.0.3 and I'm not sure if this Syntax still works in v9.0.3, there were directly several Threads on the Forum when v9.0.3 for FF got released (Aug. 2016) about this Version breaking this 'scrollBy()' Syntax...
I don't know myself actually because I've never installed v9.0.3 which was a bit too buggy and limited from Day_1, so I've never had a chance to test it myself... The previous Version (v8.9.7 for FF) is still the advised/stable Version to use, and it still works on FF v55.0.3. (I run it myself in this exact same FCI, + Win10-x64 as well.)

If you don't want to revert to v8.9.7, Scrolling (down) can also be achieved using the Keyboard 'Spacebar' which can be input from a Macro using the 'EVENT' Mode. That could be a Workaround. :idea:

I mentioned that your Script was using a few deprecated Commands, '!TIMEOUT_TAG' is one of them (replaced by '!TIMEOUT_STEP'), even if it still works, at least until v8.9.7, and I would actually be curious to know if it still works in v9.0.3...?, as several deprecated Commands were actually completely removed from the Code for v9.0.3...?

Other deprecated Command would be 'iimGetLastExtract()' (replaced by 'iimGetExtract()'), but I know that this one still works in v9.0.3.

Some other Command that your Script uses and that might cause a Pb is 'FILTER'. :oops:
It is not deprecated at all in v9.0.3, but I have reported that it got broken (using v8.9.7 for FF) from FF53 (and possibly FF52, but I went straight from FF51 to FF53 and again directly to FF54, but it was still working fine in FF v51.0.1), but absolutely no other User(s) has/have confirmed my Report.
The ('.iim' (I only use '.iim', I don't use any '.js')) Script just hangs and the Browser (FF53/54/55) hangs for ever and needs to be killed from Task Manager). I'm talking about a "large" Page with 1000 Images. A "small" Page with a few Images will eventually still manage to load (after 40 sec / 1 min / 2 min) and the Browser will not crash, but the Purpose of using 'FILTER' is a bit gone anyway as a Page is supposed to load instantly when using this Command... :shock:
- 'FILTER TYPE=IMAGES' crashes FF53/54...!
=> I don't know if v9.0.3 for FF is impacted as well but I would be interested to know if it gets broken as well in that Version (+ your current FF v55.0.3).
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 7062
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Site now uses infinite scrolling - what to do?

by mfletcher on Fri Sep 22, 2017 6:12 am

Thank you so much for your answer :)

Please allow for some time to pass while I try to update this script.

Have a good weekend!
mfletcher
 
Posts: 8
Joined: Wed Sep 20, 2017 8:22 am

Re: Site now uses infinite scrolling - what to do?

by chivracq on Fri Sep 22, 2017 9:01 am

mfletcher wrote:Thank you so much for your answer :)

Please allow for some time to pass while I try to update this script.

Have a good weekend!

Yeah-yeah, no Pb, don't worry, I'll notice your Reply once you'll have had (hum, funny grammatical Construction...!) the time to adjust your Script, and if you revert to v8.9.7 to first test/confirm about '!TIMEOUT_TAG' + 'FILTER' in v9.0.3 before reverting to v8.9.7. :wink:
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 7062
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Site now uses infinite scrolling - what to do?

by mfletcher on Tue Jan 09, 2018 4:22 am

Greetings and pardon the late reply!

Any chance you could help me in private, say in exchange for some bitcoins?

Cheers!
Last edited by mfletcher on Tue Jan 09, 2018 5:19 am, edited 1 time in total.
mfletcher
 
Posts: 8
Joined: Wed Sep 20, 2017 8:22 am

Re: Site now uses infinite scrolling - what to do?

by chivracq on Tue Jan 09, 2018 5:15 am

mfletcher wrote:Greetings and pardon the late reply!

Any chance you could help me in private, say in exchange for some bitcoins? Turns out v8.9.7 isn't compatible with the latest Firefox. This makes for a viscious circle; the latest iMacros requires a ton of workarounds, while the preferred iMacros won't work unless I downgrade Firefox (sadly not an option).

Cheers!

Hum..., more than 3 months later indeed, ah-ah...! :wink:

I don't have a Bitcoin "Account"..., hum..., maybe an Idea actually...

But..., woaf-woaf...!, you are using a '.js' Script, so you have to stick to iMacros for FF indeed..., but afaik, both v8.9.7 and v9.0.3 don't work on FF anymore from FF v57... There is a v10.0 Version for FF supposed to come out "one day", but we've been waiting for several months already and we haven't heard anything... :oops:
Best Option would be to use FF Portable maybe, then you can use FF v55.0.3 + iMacros for FF v8.9.7. :idea:
Last edited by chivracq on Tue Jan 09, 2018 5:24 am, edited 1 time in total.
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 7062
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Site now uses infinite scrolling - what to do?

by mfletcher on Tue Jan 09, 2018 5:22 am

Oh hey :) Didn't see you there before I edited my post :)

Great idea with the portable, great idea indeed! I'm on it.
mfletcher
 
Posts: 8
Joined: Wed Sep 20, 2017 8:22 am

Re: Site now uses infinite scrolling - what to do?

by mfletcher on Tue Jan 09, 2018 6:14 am

Here's the actual script: https://gist.github.com/snoopiesnax/bd9 ... 1a55da23e0 -- would be great if we could continue via PM or email ([...]@protonmail.com) - thanks!
Last edited by mfletcher on Tue Jan 16, 2018 5:43 am, edited 1 time in total.
mfletcher
 
Posts: 8
Joined: Wed Sep 20, 2017 8:22 am

Re: Site now uses infinite scrolling - what to do?

by mfletcher on Tue Jan 09, 2018 2:44 pm

I think I've made some significant improvements. So far I've commented out lastPageID / getLastPageID. Also removed goToPage() and put its content inside extractFromUser(). Did the same with processCurrentPage() which no longer needs a while loop I think?

Is runMacro("URL GOTO=javascript:window.scrollBy(0,500)"); the right syntax to use, and how do I know when there's nothing left to load?

Cheers!
mfletcher
 
Posts: 8
Joined: Wed Sep 20, 2017 8:22 am

Re: Site now uses infinite scrolling - what to do?

by chivracq on Tue Jan 09, 2018 6:24 pm

mfletcher wrote:Here's the actual script: https://gist.github.com/snoopiesnax/bd9 ... 1a55da23e0 -- would be great if we could continue via PM or email ([...]@protonmail.com) - thanks!

Needs to convert from parsing page id URLs to using scrollBy() for infinite scroll. Currently getting:

Code: Select all
Starting script...
-Importing Users-to-extract-from.csv, starting at line 1
-Saving http://soundcloud.com/vildetuv/followers
-Exiting on row 2. Either an error has occurred, or the end of file was reached.


Code: Select all
// Instead of just extracting all the followers from one user, this script imports a list of users from CSV
// and for each, saves all of their followers to the output CSV

// To speed up the running of this script, the following iMacros preferences are recommended...
// Go to iMacros options, then the "general" tab
// Set "Replay Speed" to fast
// Under "Visual Effects", untick "scroll to object when found" as well as "Highlight object when found"
// Under "Javascript scripting settings", untick "Show Javascript during replay"

// File is read from the "datasources" path set in iMacros prefs, not "downloads" path pref
const inputFileName= "Users-to-extract-from.csv";
// The starting row number (to enable importing just part of a large CSV)
const startRowID = 1;
// Name of the file where the results are output (is saved into the "Downloads" folder set in iMacros prefs)
// NB: Every time this script is run, the results are just added to the end of this file.
// So delete/rename the output file if needed - to avoid duplicate entries.
const outputFileName= "Followers-url-list.csv";

// ###################

// Global variable for status message, since using iimDisplay() clears previous messages
var statusMessage;

addStatusMessage("Importing " + inputFileName + ", starting at line " + startRowID);

// For each user in the CSV file, import all of their followers
var rowID = 1;
while (true) {
    // Not using addStatusMessage() directly, since want to throw away this last message afterwards
    iimDisplay(statusMessage + "\n-Processing row " + rowID);
    var currentRowContents = getCSVRow(rowID);
    if (!currentRowContents) {
        // Break if end of file reached, or if there was an error reading the file (eg file not found)
        addStatusMessage("Exiting on row " + rowID + ". Either an error has occurred, or the end of file was reached.");
        break;
    }
    extractFromUser(currentRowContents);
    rowID++;
}

function extractFromUser(targetUser) {
    // URL of the followers page to process
    var targetFollowersPage = targetUser + "/followers";

    // Navigates to the desired URL with images turned off, to decrease pageload time
    runMacro("FILTER TYPE=IMAGES STATUS=ON" +
            "\nURL GOTO=" + targetFollowersPage);

    addStatusMessage("Saving " + targetFollowersPage);

/*
    var lastPageID = getLastPageID();
    for (var i = startFromPageID; i <= lastPageID; i++) {
        // Not using addStatusMessage() directly, since want to throw away this last message afterwards
        iimDisplay(statusMessage + "\n-Processing page " + i);
        // Start of script navigated to page 1 already, so only need to change if i is not 1
        if (i != 1) goToPage(targetFollowersPage + "?page=" + i);
    }
*/

    // !!!!! START INFINITE SCROLLING
    runMacro("URL GOTO=javascript:window.scrollBy(0,500)");

    var currentUserURL = iimGetLastExtract(1);

    runMacro("SET !EXTRACT " + currentUserURL +
                "\nSAVEAS TYPE=EXTRACT FOLDER=* FILE=" + outputFileName);
}


/* Helper Functions */

function runMacro(macro) {
    // Runs the specified macro with a reduced tag timeout of 3 seconds (default is 60)
    return iimPlay("CODE:" + "SET !TIMEOUT_TAG 3\n" + macro);
}
function addStatusMessage(newMessage) {
    // Using iimDisplay() clears previous messages, so global statusMessage variable used to save them
    if (!statusMessage) {
        statusMessage = "Starting script...";
    }
    statusMessage += "\n-" + newMessage;
    iimDisplay(statusMessage);
}
function getCSVRow(rowID) {
    var result = runMacro("SET !DATASOURCE " + inputFileName +
    "\nSET !DATASOURCE_COLUMNS 1" +
    "\nSET !DATASOURCE_LINE " + rowID +
    "\nSET !EXTRACT {{!COL1}}");
    if (result < 0) {
        // Fetching the row failed. Could be due to end of file or else file not found.
        return null;
    } else {
        return iimGetLastExtract(1);
    }
}
/*
function goToPage(url) {
}
*/
/*
function getLastPageID() {
    // Extract the page ID of the last page of followers, using relative positioning numbering
    // The site uses "Page 1", "Page 2", "...", "Page N", "Next" type site navigation
    // First finds the "Next" link, than extracts the link text immediately prior to it, to get last page ID
    runMacro("TAG POS=1 TYPE=A ATTR=TXT:Next EXTRACT=TXT" +
            "\nTAG POS=R-1 TYPE=A ATTR=TXT:* EXTRACT=TXT");
    if (iimGetLastExtract(2) == "#EANF#") {
        // Tags not found, or timeout reached
        addStatusMessage("No next page button found, so there must only be one page total" +
                " (or else the page didn't finish loading in 60s).");
        lastPageID = 1;
    } else {
        // Tags found, so use the link text value
        lastPageID = iimGetLastExtract(2);
    }
    return lastPageID;
}
*/

/*
function processCurrentPage() {
    var i = 0;
    while (true) {
        i++;

    }
}
*/


I'm OK to give you some Advice and Suggestions but I'm not too keen on helping you "too precisely" with your Script as I don't help for Social Media for Like/Comment/Follow, sorry... Your original Qt was about "Infinite Scrolling", I'll stick to that part...

mfletcher wrote:I think I've made some significant improvements. So far I've commented out lastPageID / getLastPageID. Also removed goToPage() and put its content inside extractFromUser(). Did the same with processCurrentPage() which no longer needs a while loop I think?

Is runMacro("URL GOTO=javascript:window.scrollBy(0,500)"); the right syntax to use, and how do I know when there's nothing left to load?

Cheers!

Yep, correct Syntax, like I had already mentioned in my first Reply..., hum well, first Reply after you had mentioned your FCI as I don't help either if FCI is not mentioned...
You have some Alternatives as well by using the 'EVENT' Mode on 'PgDn' or 'End' for example, 'Space' doesn't work for that purpose on SoundCloud as it toggles the 'Play'/'Pause' of the Player.

Several ways to find out "when there's nothing left to load" I would think...:
- You can use Negative Relative Positioning from the Bottom of the Page (the Player I reckon) by extracting the first (= the last (visible) on the Page) Element and comparing after the next "Scroll Round"... If it's the same, then you know you've reached the end of the Page...!
- Or maybe easier, you have a Counter with the Nb of Followers on the Front Page for each User that you can extract, then you keep a Counter yourself in your Script of how many Followers you extract, and you'll know when to stop...!

>>>

Funny Detail but your "Vilde Tuv" probably took her inspiration for 3 of her Songs (the 3 Tracks from her "My tunes" mini-Playlist) from a good Friend of mine, ah-ah...!, (ex-)Singer from a Band from Bergen as well ("Bergen Beach Band") that she indeed obviously knows as she is following them on SoundCloud + my Friend as a solo Singer as well, ah-ah...!
And I actually even play one of their Tracks sometimes ("17. Mai Sangen"), ah-ah...! (I'm a DJ IRL...)
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 7062
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Site now uses infinite scrolling - what to do?

by mfletcher on Wed Jan 10, 2018 2:42 am

I'm OK to give you some Advice and Suggestions but I'm not too keen on helping you "too precisely" with your Script as I don't help for Social Media for Like/Comment/Follow, sorry... Your original Qt was about "Infinite Scrolling", I'll stick to that part...


Thanks. For the record I'm just trying to visualize/map connections like http://mashable.com/2009/08/21/gorgeous ... lizations/ using https://d3js.org/.
mfletcher
 
Posts: 8
Joined: Wed Sep 20, 2017 8:22 am


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 4 guests

cron
-->