Random Problem of the Night18 Nov 2012
Writing this to save for posterity on the blog since it is only in a gist currently.
There is a pretty big site that we have at work (1000+ pages probably) and I needed to find all the links in the site to a certain page and see what the query string parameters were for those links.
Each query string could have multiple keys and values, and all I needed was a unique array for all the values for each key.
Here is the code:
I used crawl which returns a JSON object for every page on the site containing an array of links on that page. Then using underscore I plucked, flattened, compacted, uniqued, and mapped those arrays so they only contained internal links for the domain in question. Then using ent and qs I parsed all the query string values into unique arrays. See the comments in the code above for more specific details.