On this final part of my series about Building a recommendation engine for Reddit I will explain how to use the similarity engine on a web application.
We left Part 3 with a fully functional similarity engine, that given a set of subreddits for a Redditor it would return the top N subreddits that are more similar to that initial set.
Step 4. Building the web application
To build the web application, we need to decide how to implement it. In my case, I chose to write this web application in Go, because is a language that requires very little memory (compared to other higher level languages such as python), and also because Go is designed for concurrency (and we will need concurrency).
So first we start with the template that the client (i.e, the user that goes to the site with his browser):
index.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="Reddit Recommendation engine">
<meta name="author" content="@manugarri">
<title>Find a sub</title>
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.1/css/bootstrap.min.css">
<link href="css/custom.css" rel="stylesheet">
<link href='http://fonts.googleapis.com/css?family=Arvo:400,700|Signika:300' rel='stylesheet' type='text/css'>
<link rel="shortcut icon" href="images/favicon.ico">
</head>
<body>
<div class="jumbotron">
<div class="container">
<h1>Reddit recommendation engine</h1>
<p>A basic implementation of a recommendation engine. It can recommend personalized subreddits based on each sub userbase</p>
<h2>Discover new subreddits based on your subscriptions</h2>
<form action="/auth" class="new-entry" )>
<input type=submit value="Recommend me!" class="btn btn-primary">
</form>
</div>
</div>
<hr>
<footer>
<p><em>Made by </em><a href="http://www.manugarri.com" target="_blank">@manugarri</a> with <3 for Reddit</em></p>
</footer>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.1/js/bootstrap.min.js">
</body>
</html>
Nothing impressive here. Just a landing page with a button that starts the reddit oauth process. When a user press the "Recommend me!"
Button, the client sends a request to /auth
(this is a relative path, so in this case, if the main page is findasub.manugarri.com, /auth will redirect to http://findasub.manugarri.com/auth)
Let’s see how the main server file looks in Go:
package main
import (
"models"
"net/http"
"utils"
"github.com/codegangsta/negroni"
"github.com/coopernurse/gorp"
"github.com/gorilla/sessions"
"github.com/joho/godotenv"
"github.com/julienschmidt/httprouter"
)
var (
sessionStore *sessions.CookieStore
dbmap *gorp.DbMap
)
func main() {
err := godotenv.Load()
utils.CheckErr(err, "Error loading Environment Variables")
dbmap = models.InitDb()
cookieSecret := "a"
sessionStore = sessions.NewCookieStore([]byte(cookieSecret))
r := setupRoutes()
n := negroni.New()
n.Use(negroni.NewLogger())
n.Use(negroni.NewRecovery())
n.Use(negroni.NewStatic(http.Dir("public")))
n.UseHandler(r)
n.Run(":4000")
}
type AppContext struct {
Params httprouter.Params
Session *sessions.Session
DbMap *gorp.DbMap
}
type AccessToken struct {
AccessToken string `json:"access_token"`
token_type string
expires_in int
scope string
}
type HttpApiFunc func(c *AppContext, w http.ResponseWriter, r *http.Request)
func newAppContext(r *http.Request) *AppContext {
session, _ := sessionStore.Get(r, "developers")
return &AppContext{
Params: nil,
Session: session,
DbMap: dbmap,
}
}
func setupRoutes() *httprouter.Router {
router := httprouter.New()
router.POST("/results", makeHandler(resultsHandler))
router.GET("/authorize_callback", makeHandler(callbackHandler))
router.GET("/auth", makeHandler(oauthHandler))
router.GET("/", makeHandler(indexHandler))
return router
}
In any Go application, the function that runs the whole app is main
.
In this case, main
does the following things:
- Load environment variables *(it’s always reasonable to implement passwords and sensitive information as environment variables.
- Start a connection with the database (the database information is implemented in the
models
folder). - Starts a
sessionStore
that is used to write and read cookies (more on this later). - Creates a
httpRouter
, an object that receives http requests (for example, the/auth
that we saw before) and redirect them to the proper Go function:- In the
setupRoutes()
function we add the endpoints that our app will listen to: - Every POST request to
/results
will be considered as if the functionresultsHandler
was called. - Every GET request to
/auth
will be handled by the functionoauthHandler
. - etc.
- In the
- Create a negroni object that acts as the midleman between the requests and httprouter
- Add a
Logger
(to log what is going on) and aNewRecovery
to negroni (to recover from error). - Add the
public
folder to negroni *(so negroni can use images, javascript and so from the “public” local folder.
So we said that once the user press the button he gets redirected to /auth, which we know that means that the function oauthHandler
gets called.
That function is on the handlers.go
file.
handlers.go
package main
import (
"fmt"
"net/http"
"net/url"
"os"
"utils"
"github.com/gorilla/sessions"
)
func indexHandler(c *AppContext, w http.ResponseWriter, r *http.Request) {
http.ServeFile(w, r, "public/index.html")
}
func oauthHandler(c *AppContext, w http.ResponseWriter, r *http.Request) {
redirectUri := url.QueryEscape(os.Getenv("REDDIT_REDIRECT_URI"))
redditClientId := os.Getenv("REDDIT_CLIENT_ID")
stateString := os.Getenv("REDDIT_STATE")
oauthUrl := "https://ssl.reddit.com/api/v1/authorize?client_id=%s&response_type=code&state=%s&redirect_uri=%s&duration=temporary&scope=history,mysubreddits,identity"
oauthUrl = fmt.Sprintf(oauthUrl, redditClientId, stateString, redirectUri)
http.Redirect(w, r, oauthUrl, http.StatusSeeOther)
}
func callbackHandler(c *AppContext, w http.ResponseWriter, r *http.Request) {
queryParams := r.URL.Query()
if queryParams["state"][0] == os.Getenv("REDDIT_STATE") {
authCode := queryParams["code"][0]
session, _ := sessionStore.Get(r, "session-name")
session.Values["done"] = false
session.Values["authCode"] = authCode
session.Save(r, w)
http.ServeFile(w, r, "public/results.html")
} else {
http.ServeFile(w, r, "public/index.html")
}
}
type Result struct {
NumberSubs int `json:"nsubs"`
Subs []string `json:"subs"`
}
func resultsHandler(c *AppContext, w http.ResponseWriter, r *http.Request) {
authCode := c.Session.Values["authCode"].(string)
nsubs, recommendedSubs := processReddit(authCode)
res := &Result{nsubs, recommendedSubs}
js := utils.MapToJSON(res)
c.Session.Options = &sessions.Options{MaxAge: -1}
c.Session.Save(r, w)
w.Header().Set("Content-Type", "application/json")
w.Write(js)
}
So we see than after pressing the main button, the oauthHandler
starts the Reddit Oauth process by redirecting the user to https://ssl.reddit.com/api/v1/authorize
At this point Reddit api will receive this request and ask the user if he/she wants to give permission to the app to get the redditor’s name and subscriptions.
If the redditor accepts, Reddit will make a GET
request to the auth_callback
defined on the Reddit Api console (to read more about how to set up your own Reddit App and about the Oauth process, you can check this link).
That request will go to callbackHandler
, with an additional access_token that allows to make requests on behalf of the user (for the permissions specified on the Reddit’s permission page), for an hour.
callbackHandler
will store the user info on a cookie so it can retrieve the user information on the next step, and will return the results.html
file if the Oauth process has succesfully being processed.
results.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="Reddit Recommentadion Engine">
<meta name="author" content="@manugarri">
<title>Find a sub</title>
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.1/css/bootstrap.min.css">
<link href="css/custom.css" rel="stylesheet">
<link href='http://fonts.googleapis.com/css?family=Arvo:400,700|Signika:300' rel='stylesheet' type='text/css'>
<link rel="shortcut icon" href="images/favicon.ico">
</head>
<body>
<script src="//ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<script type=text/javascript>
$( document ).ready(function() {
$('# fail').hide();
$('# results').text('');
$('# loading').show();
$('# search_header').hide();
$('# search_msg').text("Please wait while I fetch your recommendations");
$('# search_msg').show();
$('# fail').hide();
$('.results2').empty();
$.post('/results', {
text: $('# query').val(),
}).done(function(result) {
$('# loading').hide();
$('# search_msg').hide()
$('# n_subs').text("After analyzing " + result.nsubs + " Subreddits you like,");
$('# search_header').text("these are the Subreddits recommended for you:");
$('# search_header').show();
$.each( result.subs, function( i, item ) {
$( ".results2" ).append( "<p><a class='results_subs' target='_blank' href='http://www.reddit.com/r/" + item + "'>"+ item + "</a></p>" );
});
})
});
script>
<!-- Main jumbotron -->
<div class="jumbotron">
<div class="container">
<h2>The Engine is calculating your results</h2>
<p><em>On the meanwhile, you might wanna check the <a href="http://blog.manugarri.com/building-a-recommendation-engine-for-reddit-part-1/" target="_blank">blog post</a> where I explain how I built the engine</em></p>
</div>
</div>
<div class="container">
<!-- Results area -->
<div class="results">
<b><h2 id="n_subs"><h2></b>
<h2 id="search_msg"></h2>
<h2 id="search_header"><h2>
<img id="loading" style="display: none" src="images/dog_working.gif">
<p><strong><span id="results">{{results}}</span></strong></p>
<div class="results2">
{% for sub in data %}
<p><a class='results2' target='_blank' href="http://www.reddit.com/r/{{ sub }}"> {{ sub }} </a></p>
{% endfor %}
</div>
</div>
<hr>
<footer>
<p><em>Made by </em><a href="http://www.manugarri.com" target="_blank">@manugarri</a> with <3 for Reddit</em></p>
</footer>
</body>
</html>
On the results page we follow the same structure than in the index.html. However, we make heavy use of jQuery:
- Display a funny gif of a dog working on a computer.
- To do a
POST
request to/results
. - Wait till that endpoint has finished.
- Add list of links with the returned results
It is important to note that the POST
request is done asynchronously, that means, the rest of the page can be loaded independently without waiting for the results of that request.
That POST
request is processed (as it was defined on setupRoutes()
) by resultsHandler
, which does the following:
- Gets the
access_token
for the user from the cookie we wrote before - Performs the
processReddit()
which does all the similarity engine logic that we explained in part 3 of this series. - Return the list of recommended subreddits as a list.
Once the jQuery $.post
is finished then the .done()
function is called, and thus jquery adds a list of links to each recommended subreddit.
And that’s all! I hope I have explained the process more or less in depth. You can check the whole code on my Github repository . You can always ping me on twitter with any questions!.