Parsing USPTO Patents to Create a Massive Free Labeled Dataset

Arguably one of my favorite (and best) labeled text datasets are patents at the United States Patent and Trademark Office (USPTO). Every patent is freely available with labeled images, abstract, claims, a long description, authors, dates, classification labels, etc. Data in the provided format can be used for a lot of natural language processing (NLP) […]

Predictions for 2030

As a decade comes to a close (2010’s) it seems a good time to reflect and also predict what will come in the next ten years. Below are a set of my predictions for then. I enjoy tracking my predictions and accuracy (even keeping an account on By tracking predictions it’s possible to see […]

Over and Under Qualified

In 2015, I was offered corporate funding for a PhD at the University of Illinois Urbana-Champaign (UIUC). Further, I had a professor interested in working with me and an interesting area of research. Instead, I decided to pursue a career in industry with my recently issued B.S. in Computer Science. I joined Capital One. Qualifications […]