Parsing USPTO Patents to Create a Massive Free Labeled Dataset
Arguably one of my favorite (and best) labeled text datasets are patents at the United States Patent and Trademark Office (USPTO). Every patent is freely available with labeled images, abstract, claims, a long description, authors, dates, classification labels, etc. Data in the provided format can be used for a lot of natural language processing (NLP) […]