Data flow through the silicon like blood through our veins, it's like an efficient system of arteries that deliver oxygen to the cells. The designer has to be careful to prevent clots, increase flow, mine through lungs for oxygen and finally having the right enzyme to deliver the oxygen to the needy.

I'm a builder of this aerobic respiration unit - in terms of business data. And I love every last bit of my job - to see millions of transaction goes through a distributed cluster under the rules that I set them, into model which learns and adapt through time, and returns individualized prediction to customers, is something we could never have imagined at this scale even 10 years ago. But here we are, riding the wavefront technology into the future.

My Work

I'm a Predictive Analytics engineer working under AT&T BIZOPS. A part of the small data analytics team termed Tenzing, the (relatively) unsung hero of the first Everest Expedition. It is also similar to our task, climbing a everest sized pile of data, choosing carefully where to step and which path to take. Lucky for us, the cost of stepping into a crevasse is less catastrophic.

We have a homegrown stack written using 99% python. This stack is responsible for accumulating data from sources, storing it into Hadoop, transforming it using spark/pandas, train machine learning models using mostly spark.mllib, and deliver the model to rest servers that serve our prediction to the customers. Cron based schedulers combined with jenkins fully automate this process, the design of our stack is minimal, simple and beautiful, echoing the python philosophy.

My Research

I used to work for Duke Robotics when I was at Duke University, within this wonderful lab, I have worked on several projects:

  • Drone Detection Project:

    This project aim to detect drones using acoustic and radio sensors in order to alert the administrator of the private space. The homepage of the project can be found here. I'm currently leading a team of 4 to build a system to detect, notify both drone and humans in restricted areas.

  • SHADO Dispatch-Operator Simulator:

    This project is a computer simulation of the taskload of dispatch-operator system, such as air-traffic controller and pilots, railroad dispatchers and train engineers. We studied the impact of automation with the software we've built, the software are hosted as a web application here. I was responsible for developing the backbone version of the java code before I left the project. More recently, I'm tasked with taking the application to comply with RESTful APIs.

  • HAIER Dijkstra Simulation:

    This JavaFX simulation program was built in collaboration with JPL, it tracks how people respond to uncertainties of automation. The result of this study were put to analysis which yielded some interesting results. I was only responsible for developing the data logging for this program. The program and its source code can be found here

My Projects

Ever looking up to the shoulders of those who contributed to the open source community, and heavy user of such libraries and utilities. I try to make my own contributions, though limited by skill and time, I try to create tools that some might find useful.

  • Web Scraper Project:

    Not content with the massive scraper utility that is the scrapy library, I created a simple, lightweight and browser-enabled (js-enabled) scraper class. It also comes with a tool to crawl your favorite baidu tieba community for images. Github link

  • Framediff Project:

    This is a utility project that detects moving object from a stable reference plane. This uses the traditional frame difference method, which is universally applicable. GitHub link

  • Survey Server:

    Fed up with the high cost to mount and take a survey through survey sites such as surveymonkey, and the possibility of loss of data (This happened in one of our project). I decided to write a survey framework in php that will put all survey on the local server. I also decides to publish the code so whoever with access to a LAMP instance can host and take their own surveys. The survey server is currently hosted here, feel free to take the code and host it on your own server Github link

  • The GFW Pinger:

    This is a pinger server physically located in China, to detect which websites are not available within the country. The websites uses django for its backend to keep tab on history information and allows user to add their own website for a single page application utilizing ajax.

  • The particle game:

    This is the portage of the original particle game in javascript by me.

Go to my Github Account for more interesting things!

I also learn and help people through Stack Overflow.