Email and Spam Filtering

Have you ever signed up for an online service and suddenly started experiencing a surge of spammy marketing emails? The kinds of emails that are sent directly to you as part of a “customer retention” program after you buy an item like airline tickets, the ones that keep sending you emails about limited-time offers and credit card deals long after your purchase is complete?  Here I’ll discuss some strategies to deal with under-the-radar spam.
Dijsktra (in Prague)

Piggybacking onto my previous post about graphs and DFS and BFS, you may have noticed there is also a weight assigned to each edge in a graph. This is because I will talk about path-finding algorithms, starting with Dijsktra’s Algorithm. There are plenty of resources online, such as Wikipedia’s Description that do a fairly good job of describing the algorithm.

However, none of them explain the algorithm in a real world scenario, so I decided to describe it using a city whose beauty was distracting to my path orientation, causing me to constantly get lost: Prague, Czech Republic.

Graph of Prague

A section of Prague with a graph overlay

If you were planning a trip in a few blocks of that intersection, and all you had was this map, how would you get from, for example, point P to point W? Intuitively, you would look at A and all edges connected to it (without using an illustrated map, i.e. from a table), pick the lowest one and repeat.

Here is a PDF of Dijsktra’s Algorithm in Action and an animated GIF.

Dijsktra Animation

And finally, the updated code in PHP.

BootCamp Success!

I finally got around to installing Windows XP through Boot Camp and will document my experience here. I am running a 2007 Mac Book Pro 2.4GHz with 4GB of RAM and an NVIDIA GeForce 8600M GT on OS X Leopard. The point is to have a computer on which my wife can play Team Fortress 2 with me. Her idea. Honest. 🙂

As I indicated in an earlier post, my first hurdle was to get my MacBook’s hard drive defragmented enough for Boot Camp to be able to partition it. In theory, the HFS+ filesystem used by Mac OS X does not get fragmented due to how it was designed and how OS X uses the filesystem (read all about it on Apple’s website:

As one of my favorite professors once said: “The difference between practice and theory is small in theory but large in practice.”
Linux and OS X miscellaneous stuff

To “burn” a .iso or .img file to a USB device, type:

sudo dd if=bt4-pre-final.iso of=/dev/disk1 bs=1m

where /dev/disk1 is your particular disk. In OS X you can find out by going to the Disk Utility and selecting information on your target drive. In linux, type `dmesg` and that should give you the drive name.

I will be editing this to add more stuff.

Master’s Project/Thesis

Today, I think I finally picked out my Master’s project, with the help of Dr. Michael Shafae. I will be working towards the Netflix $1 Million prize. Yeah, I am probably late given that the contest went live in 2006, but a winning solution (one that beats the Netflix current system by 10%) hasn’t been found yet.

This raises an interesting question of what is going to happen if or when someone comes up with the answer while I am in the middle of my research. On the one hand, I didn’t win the grand prize, but on the other hand I shouldn’t be doing this for the money anyway, but rather for the sake of knowledge.

Luckily, Dr. Shafae suggested that we concurrently work on visualization techniques for the data and publish a few papers relating to the visualization techniques specifically (of course he would suggest that, he’s a graphics professor to begin with!) and, more generally, publish the data mining techniques I will have explored.

Plus, there’s absolutely no reason I can’t try to improve on the techniques beyond the 10% mark (if I ever even get that close).

However, if I ever do reach the 10% mark, and if I reach it within the lifespan of my graduate career, I will be kicking myself forever and ever for not starting it earlier and eventually receiving the $1M.

By the way, the whole reason I went into this (besides, obviously, the prize and a feeling that I have a shot against these BellKor guys), is because Dr. Shafae and I engaged in a conversation about Singular Value Decomposition and its applications in the real world. I was just curious what it was since it was in the textbook and that’s how the whole thing got started.

Anyway, it’s good to finally have a professor on my side since I am really good at starting projects and never finishing them. I am sure he will force me to show progress every week, which is definitely a good thing. In true professor/mentor fashion, he has already shown me how to keep my references organized using Bibdesk (think of it as painting fences, washing cars, and sanding decks).

Buying Houses

Last week, we went to an open house in our townhouse complex. The price was $389,000 for a 2 Bedroom, 1.5 Bath townhouse with a “stream” view (read: pipe with flowing water). The seller was nice enough to us despite knowing that we are not in the market to buy but rather just curious. She told us it is 1200 sq ft (which, as I found out online later, was a lie — it is listed under 1000 sq. ft).

Anyway, we started talking about saving enough money for a down payment in this crazy housing situation, and she handed us a card of the lender she works with and to give her a call. She said we’d be surprised what we could afford. Alarm bells rang. I would be surprised how much I can afford? Really?

Using an online mortgage calculator, it turns out I need to pay ~$2900/month on a $389,000 mortgage. That’s almost $130,000 per year when you factor in car and student loan payments (who doesn’t have those). What about bills? Electricity? Gas? Food? Fuel? Living expenses? HOA?

This is why the housing market is on the downswing. This sort of pricing simply isn’t sustainable. There are a million and one ways to finance something you can’t afford — but in the end, you still can’t afford it! People are financing $500K houses with interest-only loans, ARMS, or 60-year loans. SIXTY YEARS! These are all signs that the houses are way, WAY overpriced.

The best you can do in these situations is find a cheap place to rent and weather this storm. And when a loan agent tells you that you’d be surprised at how much house you can afford, don’t buy into it. As for that townhouse, it’s still on the market after 2 1/2 months.

Mac OS X Review

Three weeks ago I acquired a brand new MacBook Pro. This is my first Apple product after having been a Windows and Linux user and developer since I can remember.

I will try to document my switching experience as objectively as possible. I will review the hardware, software, and the GUI experience as it relates to my work and leisure.

In short, there is a lot of work to be done from a usability standpoint from a maker that touts usability as their #1 feature, but overall I am relatively happy with the MacBook Pro.
