Bon Service is a web application that allows chefs to write, standardize, and share their recipes with their kitchen team members. One of the main challenges we encountered was providing access to real data. Our initial solution was to allow manual entry of ingredients, prices, and other relevant information such as origin, supplier name, and the date of the last price update.
Although this approach is used for our free version, the paid version should offer something more robust, which would truly save time in management and eliminate the burden of data entry.
In order to achieve this, we needed to extract ingredients, prices, and other relevant information from receipts. This led me to the development of a simple receipt extractor API using Python, OpenAI's GPT-4o and Paddle OCR.
The Spark of Inspiration
My colleague Remi and I were actually working on a completely different school project at the time and OpenAI had just added the GPT-Vision model to their API. At the end of the day, we decided to experiment with the model and see if it would be possible to properly prompt it to extract ingredients from receipts PDFs and images that I had on hand.
After roughtly 25 minutes of prompting we had a working theory that it would be possible to use GPT to format the data given that we were able to provide it with good quality data. The Vision model was not really good at interpretting receipts, especially the ones that were not in english.
This however gave me the idea to explore other OCR models that might be better suited for the task and to feed the text output to GPT-4o to format the data.
Finding The Best OCR Model For Our Use Case
Tesseract OCR
First we needed to find a good OCR model that would be able to extract the text from the receipts. I started by looking at Open Source OCR models and found Tesseract, which seemed like a good fit for our use case. It worked in TypeScript, wich was the language we were using for the project, and I was able to run it on the same server as the actual application.
Tesseract was an decent model, but it had 3 glaring issues:
It's accuracy in french was suboptimal.
Extracting text from hand written receipts was near impossible.
It couldn't handle PDFs out of the box.
This meant that if I wanted to use Tesseract, I would have to convert the PDFs receipts to images and then extract the text from those. This task quickly became a lot more complicated that I thought it would be, especially in TypeScript.
While looking for a way to convert my PDFs, all signs pointed me toward Python. Python had easier ways to convert the PDFs to images but it also had a variety of other options for OCR. So without really thinking about this twice I decided that I was going to write the API in Python.
PaddleOCR
After a few more hours of research, I stumbled upon PaddleOCR, an open-source Optical Character Recognition (OCR) tool developed by PaddlePaddle, which is an open-source deep learning platform created by Baidu. PaddleOCR is designed to provide a comprehensive solution for text detection and text recognition in images. It supports a wide range of languages and is known for its accuracy and efficiency.
The base models are very powerful and it could handle both images and PDFs out of the box. This was enough to convince me to try it out.
In just a few hours I rewrote the code in Python and had a working prototype. Now I needed to build the API using Flask and feed the text output to GPT-4o to format the data and return it to my Next.js application.
Building The API
In order to use the receipt extractor I needed to write a simple API using Flask. When calling the API, the Next.js application would pass the file as input as well as the supplier name. The API would then return the extracted receipts in a user-friendly format.
I chose Flask because it was simple to use and I had already used it in the past when working on a bot that automate the registration to my local CrossFit classes. (maybe I'll write a blog post about that in the future).
Writting The Flask Application
Starting from the initial prototype, I build a simple Flask application to handle the requests and extraction.
Adding An API Key And CORS For Extra Security
In order to protect this API from being abused, I added an API key and enabled CORS. This way, only applications that are allowed to access the API can make requests to it.
The API keys are stored in a database and is only accessible my the API itself. If no valid API key is provided, the request will be aborted.
This is because the API will now receive a string that will be used to modify the base prompt. We will now be adding the supplier notes to the prompt, allowing the user to work with any suppliers.
Connecting The API To OpenAI
In order to use the API, we needed to connect it to OpenAI's GPT-4o model. This model is a powerful LLM that can generate text based on a given prompt. We used the openai
Python library to connect to the model and send the requests.
Sending The Receipt To GPT-4o
Once the API is connected to OpenAI, we can send the receipt to the model and get the extracted ingredients and prices. We used the extract_receipts
function to send the text output to GPT-4o and get the data back in JSON format.
This was working surpringly well, it returned the data in a consistent format over and over again. Now all that was left was to deploy it on my personal server and call it from the Bon Service application.
Deployment
Getting PaddleOCR to work on anything other than a Linux machine was so complicated that I decided to just use a Docker container. I created a Dockerfile that would build a minimal Python container and install all the required dependencies.
The python:3.10-slim
image is a good starting point for the container, however I had to install some additional dependencies to get it to work.
Up until now everytime a request was sent to the API, PaddleOCR would download the models. In order to avoid this, I'm now adding the models at build time. This way, the models would not need to be downloaded every time a request was made.
In our Flask application, we simply need to add the lines of code that point to the detection and recognition models.
Since our API is deployed with Docker, it would be easy to use a web server such as NGINX
to create a load balancer that would distribute the requests between different containers. This would potentially allow us to handle the requests faster.
This application will now be deployed on a DigitalOcean droplet, but you could also choose your own VPS.
Conclusion
In conclusion, the API is a powerful tool that can be used to extract ingredients and prices from receipts. Its ability to handle different languages and different receipt formats makes it a valuable asset to our application.
I hope you find this article useful and informative. If you have any questions or feedback, please feel free to reach out to me at hello@juliencm.dev. I'm always happy to hear from you!
Thank you for reading my blog!
Peace nerds,
Julien