Scraping with Ruby and Selenium

This article will discuss how to scrape websites in Ruby and Selenium. CSS class selectors will be used to scrape the data.

We are going to scrape the StackOverflow questions page and pull out the following information:

  1. Question Text
  2. Answer Count
  3. Views
  4. Vote Count

First, download the Chrome driver from Selenium according to your Chrome version from here(We will only be scraping with Google Chrome). Put the downloaded .exe file in the directory where the Ruby script is present.

Go to the questions page. Open Inspect Element window. The main element of focus will be the list item.

Get all the elements with class “question-summary”.

1
questions = @driver.find_elements(class: "question-summary")

Now run a loop to fetch the classes of “question-hyperlink”, “vote-count-post”, “status”, and “views” for question text, vote count, answer count and views respectively.

1
2
3
4
5
6
7
8
9
10
11
questions.each do |question|
  question_text = question.find_element(class: "question-hyperlink").text
  puts question_text
  vote_count = question.find_element(class: "vote-count-post").text
  puts vote_count
  answer_count = question.find_element(class: "status").text
  puts answer_count
  views = question.find_element(class: "views").text
  puts views
end

The complete code will look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
require 'selenium-webdriver'

class Scraper
  def initialize
    @long = 3
    @root_path = "https://stackoverflow.com/questions"
    @driver = Selenium::WebDriver.for :chrome,:driver_path => './chromedriver'
  end

  def main
    @driver.navigate.to @root_path
    get_questions
    sleep @long
		
    @driver.quit
  end

  def get_questions
    questions = @driver.find_elements(class: "question-summary")
    questions.each do |question|
      question_text = question.find_element(class: "question-hyperlink").text
      puts question_text
      vote_count = question.find_element(class: "vote-count-post").text
      puts vote_count
      answer_count = question.find_element(class: "status").text
      puts answer_count
      views = question.find_element(class: "views").text
      puts views
    end
  end
end

start = Scraper.new
start.main

Now you know how to scrape websites using Ruby and Selenium! Happy scraping!