December 1, 2014

Building a Recommendation Engine for Reddit. Part 4

On this final part of my series about Building a recommendation engine for Reddit I will explain how to use the similarity engine on a web application.

We left Part 3 with a fully functional similarity engine, that given a set of subreddits for a Redditor it would return the top N subreddits that are more similar to that initial set.

Step 4. Building the web application

To build the web application, we need to decide how to implement it. In my case, I chose to write this web application in Go, because is a language that requires very little memory (compared to other higher level languages such as python), and also because Go is designed for concurrency (and we will need concurrency).

So first we start with the template that the client (i.e, the user that goes to the site with his browser):

index.html

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <meta name="description" content="Reddit Recommendation engine">
    <meta name="author" content="@manugarri">
    <title>Find a sub</title>
    
    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.1/css/bootstrap.min.css">
    <link href="css/custom.css" rel="stylesheet">
    <link href='http://fonts.googleapis.com/css?family=Arvo:400,700|Signika:300' rel='stylesheet' type='text/css'>
    <link rel="shortcut icon" href="images/favicon.ico">
  </head>

  <body>
    <div class="jumbotron">
      <div class="container">
        <h1>Reddit recommendation engine</h1>
        <p>A basic implementation of a recommendation engine. It can recommend personalized subreddits based on each sub userbase</p>

      <h2>Discover new subreddits based on your subscriptions</h2>
      <form action="/auth"  class="new-entry" )>
        <input type=submit value="Recommend me!" class="btn btn-primary">
      </form>
      </div>
    </div>
      <hr>
      <footer>
        <p><em>Made by </em><a href="http://www.manugarri.com" target="_blank">@manugarri</a> with <3 for Reddit</em></p>
      </footer>
    <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.1/js/bootstrap.min.js">
  </body>
</html>

Nothing impressive here. Just a landing page with a button that starts the reddit oauth process. When a user press the "Recommend me!" Button, the client sends a request to /auth (this is a relative path, so in this case, if the main page is findasub.manugarri.com, /auth will redirect to http://findasub.manugarri.com/auth)

Let’s see how the main server file looks in Go:

package main

import (
	"models"
	"net/http"
	"utils"

	"github.com/codegangsta/negroni"
	"github.com/coopernurse/gorp"
	"github.com/gorilla/sessions"
	"github.com/joho/godotenv"
	"github.com/julienschmidt/httprouter"
)

var (
	sessionStore *sessions.CookieStore
	dbmap        *gorp.DbMap
)

func main() {
	err := godotenv.Load()
	utils.CheckErr(err, "Error loading Environment Variables")

	dbmap = models.InitDb()
	cookieSecret := "a"
	sessionStore = sessions.NewCookieStore([]byte(cookieSecret))

	r := setupRoutes()

	n := negroni.New()
	n.Use(negroni.NewLogger())
	n.Use(negroni.NewRecovery())
	n.Use(negroni.NewStatic(http.Dir("public")))
	n.UseHandler(r)
	n.Run(":4000")
}

type AppContext struct {
	Params  httprouter.Params
	Session *sessions.Session
	DbMap   *gorp.DbMap
}

type AccessToken struct {
	AccessToken string `json:"access_token"`
	token_type  string
	expires_in  int
	scope       string
}

type HttpApiFunc func(c *AppContext, w http.ResponseWriter, r *http.Request)

func newAppContext(r *http.Request) *AppContext {
	session, _ := sessionStore.Get(r, "developers")
	return &AppContext{
		Params:  nil,
		Session: session,
		DbMap:   dbmap,
	}
}

func setupRoutes() *httprouter.Router {
	router := httprouter.New()
	router.POST("/results", makeHandler(resultsHandler))
	router.GET("/authorize_callback", makeHandler(callbackHandler))
	router.GET("/auth", makeHandler(oauthHandler))
	router.GET("/", makeHandler(indexHandler))
	return router
}

In any Go application, the function that runs the whole app is main. In this case, main does the following things:

  • Load environment variables *(it’s always reasonable to implement passwords and sensitive information as environment variables.
  • Start a connection with the database (the database information is implemented in the models folder).
  • Starts a sessionStore that is used to write and read cookies (more on this later).
  • Creates a httpRouter, an object that receives http requests (for example, the /auth that we saw before) and redirect them to the proper Go function:
    • In the setupRoutes() function we add the endpoints that our app will listen to:
    • Every POST request to /results will be considered as if the function resultsHandler was called.
    • Every GET request to /auth will be handled by the function oauthHandler.
    • etc.
  • Create a negroni object that acts as the midleman between the requests and httprouter
  • Add a Logger (to log what is going on) and a NewRecovery to negroni (to recover from error).
  • Add the public folder to negroni *(so negroni can use images, javascript and so from the “public” local folder.

So we said that once the user press the button he gets redirected to /auth, which we know that means that the function oauthHandler gets called.

That function is on the handlers.go file.

handlers.go


package main

import (
	"fmt"
	"net/http"
	"net/url"
	"os"
	"utils"

	"github.com/gorilla/sessions"
)

func indexHandler(c *AppContext, w http.ResponseWriter, r *http.Request) {
	http.ServeFile(w, r, "public/index.html")
}

func oauthHandler(c *AppContext, w http.ResponseWriter, r *http.Request) {
	redirectUri := url.QueryEscape(os.Getenv("REDDIT_REDIRECT_URI"))
	redditClientId := os.Getenv("REDDIT_CLIENT_ID")
	stateString := os.Getenv("REDDIT_STATE")
	oauthUrl := "https://ssl.reddit.com/api/v1/authorize?client_id=%s&response_type=code&state=%s&redirect_uri=%s&duration=temporary&scope=history,mysubreddits,identity"
	oauthUrl = fmt.Sprintf(oauthUrl, redditClientId, stateString, redirectUri)
	http.Redirect(w, r, oauthUrl, http.StatusSeeOther)
}

func callbackHandler(c *AppContext, w http.ResponseWriter, r *http.Request) {
	queryParams := r.URL.Query()
	if queryParams["state"][0] == os.Getenv("REDDIT_STATE") {
		authCode := queryParams["code"][0]
		session, _ := sessionStore.Get(r, "session-name")
		session.Values["done"] = false
		session.Values["authCode"] = authCode
		session.Save(r, w)

		http.ServeFile(w, r, "public/results.html")
	} else {
		http.ServeFile(w, r, "public/index.html")
	}
}

type Result struct {
	NumberSubs int      `json:"nsubs"`
	Subs       []string `json:"subs"`
}

func resultsHandler(c *AppContext, w http.ResponseWriter, r *http.Request) {
	authCode := c.Session.Values["authCode"].(string)
	nsubs, recommendedSubs := processReddit(authCode)
	res := &Result{nsubs, recommendedSubs}
	js := utils.MapToJSON(res)
	c.Session.Options = &sessions.Options{MaxAge: -1}
	c.Session.Save(r, w)
	w.Header().Set("Content-Type", "application/json")
	w.Write(js)
}

So we see than after pressing the main button, the oauthHandler starts the Reddit Oauth process by redirecting the user to https://ssl.reddit.com/api/v1/authorize

At this point Reddit api will receive this request and ask the user if he/she wants to give permission to the app to get the redditor’s name and subscriptions.

If the redditor accepts, Reddit will make a GET request to the auth_callback defined on the Reddit Api console (to read more about how to set up your own Reddit App and about the Oauth process, you can check this link).

That request will go to callbackHandler, with an additional access_token that allows to make requests on behalf of the user (for the permissions specified on the Reddit’s permission page), for an hour.

callbackHandler will store the user info on a cookie so it can retrieve the user information on the next step, and will return the results.html file if the Oauth process has succesfully being processed.

results.html

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <meta name="description" content="Reddit Recommentadion Engine">
    <meta name="author" content="@manugarri">
    <title>Find a sub</title>
    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.1/css/bootstrap.min.css">
    <link href="css/custom.css" rel="stylesheet">
    <link href='http://fonts.googleapis.com/css?family=Arvo:400,700|Signika:300' rel='stylesheet' type='text/css'>
    <link rel="shortcut icon" href="images/favicon.ico">
  </head>
  <body>
<script src="//ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
  <script type=text/javascript>
$( document ).ready(function() {
    $('# fail').hide();
    $('# results').text('');
    $('# loading').show();
    $('# search_header').hide();
    $('# search_msg').text("Please wait while I fetch your recommendations");
    $('# search_msg').show();
    $('# fail').hide();
    $('.results2').empty();
    $.post('/results', {
        text: $('# query').val(),
    }).done(function(result) {
        $('# loading').hide();
        $('# search_msg').hide()
	$('# n_subs').text("After analyzing " + result.nsubs + " Subreddits you like,");
        $('# search_header').text("these are the Subreddits recommended for you:");
        $('# search_header').show();
        $.each( result.subs, function( i, item ) {
          $( ".results2" ).append( "<p><a class='results_subs' target='_blank' href='http://www.reddit.com/r/" + item + "'>"+ item + "</a></p>" );
        });
    })
});
script>

    <!-- Main jumbotron -->
    <div class="jumbotron">
      <div class="container">
        <h2>The Engine is calculating your results</h2>
	<p><em>On the meanwhile, you might wanna check the <a href="http://blog.manugarri.com/building-a-recommendation-engine-for-reddit-part-1/" target="_blank">blog post</a> where I explain how I built the engine</em></p>
      </div>
    </div>
    <div class="container">
      <!-- Results area -->
          <div class="results">
		  <b><h2 id="n_subs"><h2></b>
            <h2 id="search_msg"></h2>
            <h2 id="search_header"><h2>
            <img id="loading" style="display: none" src="images/dog_working.gif">
            <p><strong><span id="results">{{results}}</span></strong></p>
            <div class="results2">
              {% for sub in data %}
              <p><a class='results2' target='_blank' href="http://www.reddit.com/r/{{ sub }}"> {{ sub }} </a></p>
              {% endfor %}
            </div>
    </div> 
      <hr>

      <footer>
        <p><em>Made by </em><a href="http://www.manugarri.com" target="_blank">@manugarri</a> with <3 for Reddit</em></p>
      </footer>
  </body>
</html>

On the results page we follow the same structure than in the index.html. However, we make heavy use of jQuery:

  • Display a funny gif of a dog working on a computer.
  • To do a POST request to /results.
  • Wait till that endpoint has finished.
  • Add list of links with the returned results

It is important to note that the POST request is done asynchronously, that means, the rest of the page can be loaded independently without waiting for the results of that request.

That POST request is processed (as it was defined on setupRoutes()) by resultsHandler, which does the following:

  • Gets the access_token for the user from the cookie we wrote before
  • Performs the processReddit() which does all the similarity engine logic that we explained in part 3 of this series.
  • Return the list of recommended subreddits as a list.

Once the jQuery $.post is finished then the .done() function is called, and thus jquery adds a list of links to each recommended subreddit.

And that’s all! I hope I have explained the process more or less in depth. You can check the whole code on my Github repository . You can always ping me on twitter with any questions!.

Powered by Hugo & Kiss.