[code] Scraping API Docs

[code] Scraping API Docs

Postby Jim » Sat Feb 11, 2012 2:20 pm

Example code for scraping the api docs pages. Might be useful for indexing or analysis...

Use at your own risk - if you run it "too much," the Google machine will temporarily block you. Who knows what happens if you "abuse" it. I didn't look it up, but one should assume scraping is in violation of Google's Terms of Service.

Just run it once and direct output to a file:

$ scrape.rb > output.txt

Code: Select all
# Scrapes API docs for class names, method names, and method versions.
require 'open-uri'
require 'nokogiri'

# ctrl-c on WinXP
trap("INT") {
    $stderr.puts "abort."
    @abort = true
}

base = "https://developers.google.com/"
class_index_url = base + "sketchup/docs/classes"

page = Nokogiri::HTML(open(class_index_url))

classes = {}

page.css(".columns a").each do |link|
    classes[link.text] = link['href']
    break if @abort
end

exit if @abort

classes.each do |name, url|
    puts name
    loc = base + "sketchup/docs/ourdoc/" + name.downcase
    page = Nokogiri::HTML(open(loc))
    page.css(".apireference").each do |elem|
        method_name    = elem.css(".itemname").text
        method_version = elem.css(".version").text
        puts "#{method_name},#{method_version}"
    end
    puts
    break if @abort
end
0
Hi

Jim 
Global Moderator
 

SketchUcation One-Liner Adverts

by Ad Machine » 5 minutes ago



Ad Machine 
Robot
 


 

Return to Developers' Forum

Who is online

Users browsing this forum: Bing [Bot] and 12 guests