As of September 13, 2020, CAT has been decommissioned.
It is with serious regrets that I need to announce that the Coding Analysis Toolkit is decommissioned. Over time, our ability maintain the software, now 13 years old, has been impacted by life events and career changes.
A few quick notes:
- Have a need for collaborative text analytics software? Try DiscoverText.
- The CAT code is open source, but unmaintained; someone else could host a version.
- The users did amazing, diverse, and important work, see: the list on Google Scholar
- There have been 18,905 primary CAT accounts and 1,801 sub-accounts. CAT users had uploaded 10,956 coded datasets and 23,049 raw datasets, in addition, they had coded a total of 2,664,933 items and adjudicators have made 242,907 validation choices.
/ul>
The service grew out of a need to enable groups to accurately classify text and more easily measure and report the results. We learned a lot about humans and software along the way. Some of the ten insights gleaned from my 20 years of using software to label text data (listed below) will, I hope, stand the test of time:
- Some labeling/coding tasks are harder than others.
- Some humans are better than others.
- Average humans are often not nearly good enough.
- Using and comparing just two coders is problematic.
- Find domain specific ways to rank humans over time.
- Even the best human coders struggle with some tasks.
- CoderRank weighted machine-learning (ML) training sets.
- When training machines (ML), start with easier tasks.
- Easier means fewer codes at once where a binary is best.
- Easier means "relevance" first to clean messy data.
I want to thank all the users who patiently sent feedback in the early years. It has been a journey...but we are signing off.