3 ways to experiment with text analytics

Text analytics, sometimes called text data mining, is the process of uncovering insightful and actionable information, trends, or patterns from text. The extracted and structured data is much more convenient than the original text, making it easier to determine the information’s data quality and usefulness. Developers and data scientists can then use the mined data in downstream data visualizations, analytics, machine learning, and applications.

Text analytics aims to identify facts, relationships, sentiments, or other contextual information. The types of information extracted often start with tagging entities such as people’s names, places, and products. It can advance to assigning topics, determining categories, and discovering sentiments. When measures such as currencies, dates, or quantities are extracted, establishing their relationship to other entities (and any qualifiers) is a key text analytics capability.

Extracting data from documents versus form fields

The hardest challenges in text analytics are processing enterprise repositories and large documents such as aggregated news from websites, corporate SEC filings, electronic health records, and other unstructured or semistructured documents. Parsing documents has some unique challenges as the document’s size and structure often dictate domain-specific preprocessing rules and NLP (natural language processing) algorithms. For example, categorizing a 1,000-word blog post is a lot easier than ranking all of the topics found in a book collection. Also, larger documents often require validating the extracted information based on context; for instance, the medical conditions of a patient should be categorized independently from the conditions listed in their family history.

But what if you want to perform a potentially simpler task of extracting information from a form field or other short text snippet? Consider these possible scenarios:

  • Quantify feedback from an employee survey’s open-ended responses
  • Process social media posts for their sentiments about brands or products
  • Categorize different types of chatbot interactions
  • Assign topics to user stories on an agile backlog
  • Route service desk requests based on the problem details
  • Parse information submitted to marketing on your website

These problems require more simplified algorithms than parsing documents because the text fields are identifiable, short, and often carry a specific type of information.

Let’s say you need to leverage unstructured field data in an application or are asked to include insightful information extracted from text in a data visualization. Text analytics is an important first step, and agile data science teams often use spikes to conduct discovery work. The team needs tools, skills, and methodologies to perform text analytics. Here are three different approaches.

Copyright © 2021 IDG Communications, Inc.

Source link