The underlying process of this analysis is called entity extraction, entity identification, or named entity recognition. This process seeks to locate and classify small elements in text into predefined categories – in this case dates. The OpenNLP Library is used to run the text through a series of activities that include: tokenization, Part-of-speech tagging, and entity extraction. Information extraction is used to extract date entities that can be displayed on a timeline. This allows a researcher to review sentences that include dates by examining a timeline. We are using the OpenNLP system to extract the entities from the text in an automated fashion. The date entities and their sentences are then displayed in Simile Timeline.
- OpenNLP – http://opennlp.sourceforge.net
- Simile Timeline – http://code.google.com/p/simile-widgets/