At Shovels, we aim to simplify the complexity of permitting data through accurate and systematic AI-powered classification. Here's a look at how we use AI to assign permit categories.
How Our AI Classification System Works
Our system categorizes permits into 22 clearly defined labels, including ADDITION, HVAC, SOLAR, and several other specific permit categories. (View the complete permit category list here). Spoiler alert: more tags related to frequently requested projects will be coming soon, e.g. fiber, data center, public utility, to name a few. Got more ideas? Feel free to leave some ideas here.
To maintain high accuracy, we collaborated with Prolific, utilizing their specialized annotators to validate and refine our classifications.
Technical Steps Behind AI Tagging
Our AI tagging process involves multiple layers:
- Direct Pattern Matching: Initially, the AI identifies explicit mentions within permit descriptions.
- Semantic Similarity Search: Next, semantic similarity and nearest neighbor searches are performed using advanced techniques such as Facebook's Faiss algorithm.
- Weak Entity Extraction: If the previous methods yield no definitive matches, the AI extracts significant nouns from the permit descriptions to determine the category.
Importantly, our AI approach does not directly output classifications from Large Language Models (LLMs). Instead, we use LLMs to generate structured, deterministic processing code. This mitigates the risk of inaccuracies often associated with direct LLM outputs.
Addressing Common Technical Questions
Customers frequently ask detailed questions, enabling us to continually enhance our classification precision:
Q: If a permit description states "remove a water heater," does it get the WATER_HEATER tag?
A: Yes. Our AI recognizes terms like "remove," "replace," or "change out" (including abbreviated forms like "c/o"). For example, approximately 2.3 million HVAC-related permits involving "change-outs" have been accurately classified.
Q: If a permit explicitly mentions transitioning from GAS to HEAT_PUMP, how does tagging work?
A: Both GAS and HEAT_PUMP tags would typically be applied, clearly reflecting the described transition in the permit.
Maintaining Classification Accuracy
Through rigorous validation with specialized annotators from Prolific, we consistently monitor and enhance our AI classification accuracy. This careful validation process helps us minimize classification errors and maintain data reliability.
Learn More
For further insights into our data labeling and AI processes, review our detailed case study with Prolific, or contact us today.