Read To Me
Project submission for AWS DeepLens Challenge
This project has been updated to use MxNet. I have also added some performance imporovements around the text cleanup process. Please see the repo for the latest changes.
For this project, I wanted to build an application that could read books to children. In order to achieve this, I designed a workflow which performs the following steps.
- Determine when a page with text is in the camera frame
- Clean up the image using OpenCV
- Perform OCR (Optical Character Recognition)
- Transform text into audio using AWS Polly
- Play back the audio through speakers plugged into DeepLens
I used Tensorflow to create an object detection model. At the time of this writing, the onboard Intel Model Optimization library does not work for TensorFlow. Once it is fixed I will be able to optimize this model to run on the GPU on the DeepLens device.
My dataset was made from a few hundred photos of my kids' books taken in various lighting conditions, orientations, and distances.
Following the tutorial, I used labelImg to annotate my dataset with bounding boxes so I could train the model to identify Text Blocks on a page.
Here is the model that I trained.
This project is built using GreenGrass, Python 3.6, TensorFlow, OpenCV, Tesseract, and AWS Polly.
Instructions for testing
There is a test python script that you can use to test the application on your development machine before deploying to the DeepLens. You will need to install a few dependancies before being able to run the application. I would recommend you create a virtual environment and pip install the following dependancies.
To run this project on the deeplens, you will need to install Tesseract and TensorFlow.
In order to get sound to play on the DeepLens, you will need to grant GreenGrass permission to use the Audio Card.
Green Grass requires you to explicitly authorize all the hardware that your code has access to. One way you can configure this through the Group Resources section in the AWS IOT console. Once configured, you deploy these settings to the DeepLens which results in a JSON file getting deployed greengrass directory on the to the device.
To enable Audio playback through your Lambda, you need to add two resources. The sound card on the DeepLens is located at the path “/dev/snd/”. You need to add both “/dev/snd/pcmC0D0p” and “/dev/snd/controlC0” in order to play sound.
In order to get the Text Area cleaned up to perform OCR, it needs to go through a number of filters. This graphic shows the steps that ReadToMe goes through with each image before trying to turn the image into text.