December 12, 2016

HDFS vs HBase in PySpark 2.0

It’s been a challenging period at Tribe. In case you don’t know, I am in charge of selecting and implementing the architecture for an ad Exchange with a focus on latency and performance. As fun as it sounds, it is incredibly challenging, every decision in terms of tools not only affects the current performance, but given the sheer amount of data creates technical debt instantly (data migrations are always a pain, but when dealing with big data they are a big pain, hehe). Read more

December 6, 2016

Weird behavior in Class methods vs StaticMethods in Pyspark

Note, using Spark 2.0.0 with python 2.7 I just found a very weird behavior in PySpark. I will show it with an example. Who knows, maybe this can help someone else. I am processing a list of text files containing data in jsonlines format. After some fiddling I set up a basic class to process the files: class TestClassProcessor(object): def __init__(self): self.spark = SparkSession...GetOrCreate() @staticmethod def parse_record(self, record): ... do something with record. Read more

June 19, 2016

Raspberry reminder

Tired of realizing at 4 am that I was supposed to go to bed at midnight, I took my raspberry pi from it box and finally gave it a good use. I made a small app that allows you to send reminders to the raspberry headphone from your main computer. So you can do something like this from your terminal: remind go to sleep 3600 That will create an mp3 with the message “go to sleep! Read more

June 18, 2016

Install CUDA on Ubuntu 16.04

For the record, my graphic card is a GTX 870M. Install Nvidia recommended drivers. You can find it out by using the command ubuntu-drivers devices. It will tell you the recommended driver you should install via apt-get (I installed nvidia-361). 2.Restart 3.Download CUDA Toolkit from here or Direct link to version 5.7.18 I used Run the file ./cuda_7.5.18_linux.run Make sure to say yes to everything EXCEPT the prompt that reads: Read more

May 22, 2016

Video, A primer on recommendation systems

Last April I gave a talk on PyData Madrid 2016 about recommendation systems. Here is the video in case you wanna check it out. And here is the repository with the slides, code & data.

Powered by Hugo & Kiss.