Use at your own risk - if you run it "too much," the Google machine will temporarily block you. Who knows what happens if you "abuse" it. I didn't look it up, but one should assume scraping is in violation of Google's Terms of Service.
Just run it once and direct output to a file:
$ scrape.rb > output.txt
- Code: Select all
# Scrapes API docs for class names, method names, and method versions.
require 'open-uri'
require 'nokogiri'
# ctrl-c on WinXP
trap("INT") {
$stderr.puts "abort."
@abort = true
}
base = "https://developers.google.com/"
class_index_url = base + "sketchup/docs/classes"
page = Nokogiri::HTML(open(class_index_url))
classes = {}
page.css(".columns a").each do |link|
classes[link.text] = link['href']
break if @abort
end
exit if @abort
classes.each do |name, url|
puts name
loc = base + "sketchup/docs/ourdoc/" + name.downcase
page = Nokogiri::HTML(open(loc))
page.css(".apireference").each do |elem|
method_name = elem.css(".itemname").text
method_version = elem.css(".version").text
puts "#{method_name},#{method_version}"
end
puts
break if @abort
end