Wednesday, 7 August 2019

Docker build for TensorFlow 1.14 + Jupyter for Rasapbian Buster on Raspberry Pi 4B

I've updated an old Docker build to create a Docker image for Raspberry Pi  Buster. It contains Katsuya Hyodo's TensorFlow wheel which has TensorFlow Lite enabled.

The image contains Jupyter, so you can connect to the running image from anywhere on your network and run TensorFlow notebooks on the Pi.

This is highly experimental - don't use it for anything important :)

Once I've done some tidying up (and a lot more testing!) I'll put the image up pon DockerHub.

To build/run it you need the Docker nightly build. I installed it by invoking

curl -fsSL | CHANNEL=nightly sh
If you find any problems please raise an issue on GitHub.

Sunday, 21 July 2019

Benchmarking process for TF-TRT, and a workaround for the Coral USB Accelerator

A couple of days ago I published some benchmarking results running a TF-TRT model on the Pi and Jetson Nano. I said I'd write up the benchmarking process. You'll find the details below. The code I used is on GitHub.

I've also managed to get a Coral USB Accelerator running with a Raspberry Pi 4. I encountered a minor problem, and I have explained my simple but very hacky workaround at the end of the post.

TensorFlow and TF-TRT benchmarks


The process  was based on this excellent article, written by Chengwei Zhang.

On my workstation

I started by following Chengwei Zhang's recipe. I trained the model on my workstation using and then copied trt_graph.pb from my workstation to the Pi 4.

On the Raspberry Pi 4

I used a virtual environment created with pipenv, and installed jupyter and pillow.

I downloaded and installed this unofficial wheel.

I tried to run step2.ipynb but encountered an import error. This turned out to be an old TensorFlow bug resurfacing.

The maintainer of the wheel will fix the problem when time permits, but I used a simple workaround.

I used cd `pipenv --venv` to go to the location of the virtual environment, and then ran cd lib/python3.7/site-packages/tensorflow/contrib/ to move to the location of the offending file

The problem lines are

if != "nt" and platform.machine() != "s390x":
     from tensorflow.contrib import cloud
These try to import cloud from tensorflow.contrib, which isn't there and fortunately isn't needed :)

I replaced the second line with pass using

sed -i '/from tensorflow.contrib import cloud/s/^/ pass # '

and captured the timings.

Later, I ran raw-mobile-netv2.ipynb to see how long it took to run the training session, and to save the model and frozen graph on the Pi.

On the Jetson Nano

I used the Nano that I had configured for my series on Getting Started with the Jetson Nano; it had NVIDIA's TensorFlow, pillow and jupyter lab installed.

I found that I could not load the saved/imported trt_graph.pb file on the Nano.

Since running the original training stage on the Pi did not take as long as I'd expected, I ran step1.ipynb on the Nano and used the locally created trt_graph.pb file which loaded OK.

Then I ran step2.ipynb and captured the timings which I published.

Using the Coral USB Accelerator with the Raspberry Pi 4

The Coral USB Accelerator comes with very clear installation instructions, but these do not currently work on the Pi 4.

A quick check of the install script revealed a hard-coded check for the Pi 3B or 3B+. Since I don't normally use the Pi 3B, I changed that entry to accept a Pi 4.

When I ran the modified I found a couple of further issues.

The wheel installed in the last step of the script expects you to be using python3.5 rather than the Raspbian Buster default of python3.7.

As a result, I had to (cough)

cd /usr/local/lib/python3.7/dist-packages/edgetpu/swig/

and change all references in the demo paths from python3.5 to python3.7

With these changes, the USB accelerator works very well. There are plenty of demonstrations provided. In this image it's correctly identified two faces in an image from a workshop I ran at Pi towers a couple of years ago.

It's an impressive piece of hardware. I am particularly interested by the imprinting technique which allows you to add new image recognition capability without retraining the whole of a compiled model.

Imprinting is a specialised form of transfer learning. It was introduced in this paper, and it appears to have a lot of potential. Watch this space!

Friday, 19 July 2019

Benchmarking TF-TRT on the Raspberry Pi and Jetson Nano

Trying to choose between the Pi 4B and the Jetson Nano for a Deep Learning project?

I recently posted some results from benchmarks I ran training and running TensorFlow networks on the Raspberry Pi 4 and Jetson Nano. They generated a lot of interest, but some readers questioned their relevance. They were'n interested in training networks on edge devices.

Most people expect to train on higher-power hardware and then deploy the trained networks on  the Pi and Nano. If they use TensorFlow for training, they have are several choices for deployment:

  1. Standard TensorFlow
  2. TensorFlow Lite
  3. TF-TRT (a TensorFlow wrapper around NVIDIA's TensorRT, or TRT)
  4. Raw TensorRT
In this post I'll focus on timing Standard TensorFlow and TF-TRT. In a later post I plan to cover TensorFlow Lite on the Pi with and without accelerators like the Coral EDGE TPU coprocessor and the Intel Compute Stick.

I've run a number of benchmarks, and the results have been much as I expected.
I did encounter one surprising result, though, which I'll talk about at the end of the post. It's a pitfall that could easily have invalidated the figures I'm about to share.

Benchmarking MobileNet V2

The results I'll report are based on running MobileNetV2 pre-trained with ImageNet data. I've adapted the code from the excellent DLology blog which covers deployment to the Nano. I've also deployed the model on the Pi using a hacked community build of TensorFlow, obtained from here.  That has a wheel containing TF-TRT for python3.7, which is the default for Raspbian Buster.

(The wheel seems to have a minor bug. This weekend I'll set up a GitHub repo with the sed script I used to work around that, along with the notebooks I used to run the tests.)

The Pi 4B has 4GB of RAM, running a freshly updated version of Raspbian buster.

So here are the results you've been waiting for, expressed in seconds per image and frames per second (FPS).

Platform        Software    Seconds/image   FPS
Raspberry Pi    TF          0.226            4.42
Raspberry Pi    TF-TRT      0.20             5.13
Jetson Nano     TF          0.082           12.2
Jetson Nano     TF-TRT      0.04            25.63

According to these figures, the Nano is three to five times faster than the Pi, and TF-TRT is about twice as fast as raw TensorFlow on the Nano.

TF-TRT is only slightly faster than raw TensorFlow on the Pi. I'm not sure why this should be, but the timings are pretty consistent. At some stage I'll run some other models, but those will have to do for now.

A benchmarking pitfall

I mentioned one pitfall. When I re-ran the tests for this blog post I got much  slower performance for the Nano using TF-TRT - around 5 fps.

Fortunately Raffaello Bonghi's excellent jtop package saved the day.  jtop is an enhanced version of top for Jetsons which shows real-time GPU and memory usage.

Looking at its output,  I realised that an earlier session on the Nano was still taking up memory. Once I'd closed the session down, a re-run gave me the 25 fps which I and others had seen before.

I continue to be impressed by the Pi 4 and the Nano.

While the Nano's GPU offers significantly faster performance on Deep Learning tasks, it cost almost twice as much. Both represent excellent value for money, and your choice will depend on the requirements for your project.