December 12, 2016
HDFS vs HBase in PySpark 2.0
It’s been a challenging period at Tribe. In case you don’t know, I am in charge of selecting and implementing the architecture for an ad Exchange with a focus on latency and performance.
As fun as it sounds, it is incredibly challenging, every decision in terms of tools not only affects the current performance, but given the sheer amount of data creates technical debt instantly (data migrations are always a pain, but when dealing with big data they are a big pain, hehe).
Read more