Text analytics as a service

Natural Language Processing

Tag: classification codes

Patent classification service update

We’ve just made a new version of the patent classification service live.

Changes are:

  • Radio buttons to say what you want to happen when you click on a symbol link (populate Context or Explore).
  • Support for query terms like Symbol:A23* and a Symbol Prefix field (two ways to do the same thing, the former is more flexible, the later is more user friendly).
  • Options for unstemmed search
  • Term auto-complete using exact and fuzzy matches, exact matches are shown first, both are sorted with terms matching more classifications first (other sorting would be possible)
  • Added a link to the query syntax documentation. The fields available for querying are documented here
  • fixed bugs with && and IPC formatting. There may still be an issue in that IPC “B65H75/00” will always be shown as “B65H75” whereas CPC can show “B65H75” with a child “B65H75/00”. This is because the CPC data uses this human friendly format, whereas the IPC data uses A99AZMMMGGGGGZ which we have to reformat for display.

Patent classification update: web browser testing

We’ve now done some extensive browser testing for the web-demo of the patent classification service. The web-demo works on

  • Chrome
  • Firefox
  • Safari
  • Android phones and
  • iDevices (iPad, iPhone).
  • Internet explorer is a bit trickier, but the following have been tested
    • IE 11 on Win 8.1
    • IE 10 on Win 7
    • IE 9 on Win 7

The service itself is agnostic to browsers.

As IE8 doesn’t support manipulation of XML elements embedded in HTML (which are used extensively by the demo) IE8 users will need to wait for an update – either from us, or by downloading a recent version of IE.

Patent classification update

The USPTO, CPC and IPC have each recently updated their classification systems. We have updated our underlying classification data:

  • CPC release Dec 2013
  • IPC release Jan 2014
  • USPC release Jan 2014

This brings the patent classification API up to date with current international classification documentation.

Text analytics for: Patent classification

In this project, we developed a searchable index of patent classification codes that allows search by text and by code. We also extended this to allow users to explore the classification hierarchy. This blog entry page describes the demonstration web page.

Pat Class Web

Patent Classification pat-clas.t3as.org

For those unfamiliar with reading patents, we refer to The Lens and the tutorial, how to read a patent. Within patents, classification codes provide significant benefit in understanding and searching patents: patents with similar codes are likely to refer to similar content. From the British Library emphasis added

“The usefulness of patent classification as a means of searching for patents information is a by-product of its primary purpose as a tool for patent examiners. Using patent classification as part of a search to identify patents in a particular field can help the non-expert searcher to focus and refine his search and produce a useful set of references… However, it is a massive and complex tool designed for an expert user group and when it is used by anyone outside that user group it should be applied with care.”

For the non-expert, classification codes are difficult to use. For example, a patent for locomotive on the Lens, has two IPC classification codes associated with it: B61C17/04 and B61D27/00. What do these codes mean?

Entry point: web page

The text analytics service for this project is hosted at pat-clas.t3as.org and has a public Github repository for all code. The web page provides an open html file, that accesses the service API’s and presents a simple interface for text- or code- based search of CPC, IPC, and USPTO classification codes.


Free text search

The first field of the web page allows users to enter free text, and return classification codes. Following our example, let’s choose IPC, and enter “locomotive”. The search returns all IPC codes that contain “locomotive” in their text. The list is sorted based on relevance (rank), with all relevant search terms highlighted in the codes. The search returns at most the top 50 items.

The next field allows the user to find the context of a given code. Let’s try B61C17/04:

Code context

Code context

The context is build up of the the parent codes in the classification hierarchy, with their associated text stubs. Explore can also be used to view the hierarchy of the classification system – for example to find siblings of the code B61C17/04.

Classification hierarchy for Locomotive

Classification hierarchy for Locomotive

The screen flow below outlines how to use the web page.

Under the hood: How it works

All the code is available on Public GitHub. If you are interested in developing applications that use this code, you should read the README on GitHub.

  1. CPC/IPC/USPTO codes are converted to list of string descriptions
    • one for the code itself and
    • one for each ancestor in the hierarchy.

    This is a very simple database app with XML processing to populate the database.

  2. Given a text query, find CPC/IPC/USPTO codes that have descriptions matching the query. A very simple Lucene search app.

We’ll post more details soon.

The fine print

  • The service is hosted on Amazon Web Services, with uptime on a best-effort basis and no redundancy.
  • If requested, we may upgrade the hosting to production grade with hardware redundancy.
  • The web page/user interface is designed as a demonstration of the underlying web services, and is not intended to be a user interface designed for any particular use case

This combines work from Neil Bacon, Gabriela Ferraro and Mats Henrikson