Beware of False Positive Accumulation

My first gig as a data scientist, at IMVU, I was tasked with building a random forest model to prevent credit card chargebacks. The model was pretty simple. It would build features about the transaction and the user and use those to estimate the probability of a chargeback. We had a large, beautiful dataset of previous chargebacks, because unfortunatly at the time we were getting a lot of them. The model didn’t need to be very complex to do a decent job, that was not the interesting part of the project.

Read More

The Real job of a data scientist

On the surface a data scientist does many things. You will be many things at different times. An engineer, an evengelist, a web dev and a database manager. Sometime you will be a SQL monkey, other times a PM. You will build data products and (semi)intelligent systems. You will make machines able to learn and then teach them. You will clean data (so much cleaning).

Read More

Monoids in python

Sometimes mathematicians come up with very obscure topics that seem like they will never be useful. Usually we realize, years later, that one of the mystical toys mathematiacians leave lying around is actually a perfect shaped tool for the problem at hand. I want to introduce you to one of these tools, that has been gaining traction in recent years, as a way to solve problems with big data.

Read More

Singapore to Scotland by train

Things I lost on this trip:

  • My credit card, and my debit card
  • about 40lbs
  • caution while crossing the street. (Also I am not sure which way to look now)
  • the ability to drink brandy (I blame the Russians)
  • The soles of my shoes.
  • fear of drowning
Read More