A framework for embedding hybrid term proximity score with standard TF-IDF to improve the performance of recipe retrieval system
David Johnson
Hongyang Sun
Information retrieval system plays an important role in the modern era in retrieving relevant information from a large collection of data, such as documents, webpages, and other multimedia content. Having an information retrieval system in any domain allows users to collect relevant information. Unfortunately, navigating a modern-day recipe website presents the audience with numerous recipes in a colorful user interface but with very little capability to search and narrow down your content based on your specific interests. The goal of the project is to develop a search engine for recipes using standard TF-IDF weighting and to improve the performance of the standard IR by implementing term proximity. The approach used to calculate term proximity in this project is a hybrid approach, a combination of span-based and pair-based approaches. The project architecture includes a crawler, a database, an API, a service responsible for TF-IDF weighting and term proximity calculation, and a web application to present the search results.