Semantic Segmentation with tf.Keras

In this article I will talk about semantic segmentation with tf.keras and a possible implementation for recreating a simplified version of the U-Net Neural Network Architecture. This implementation is only an experiment and is not a made for any competition. I just wanted to explore semantic segmentation, while learning more about Machine Learning, Keras and more on Computer Vision.

What is image segmentation?

Image segmentation is the computer vision technique for understanding what is displayed on a pixel level. It is similar to image recognition, where objects are recognized (and probably localized). With segmentation the “recognition” happens on a pixel level. Therefore image segmentation gives information about the image contents more granularly.

The main applications are:

  • Photo/Video editing and creativity tools
  • Traffic control systems
  • Autonomous vehicles
  • Robotics

What type of segmentation is there?

Semantic segmentation

Is the process of finding a class label for each pixel. The classes can be different objects e.g. buildings vs cars. There can be sub classes of a class e.g. vehicle -> car, truck, van etc. Nevertheless all found car pixels are assigned the same label.

Instance segmentation

Instance segmentation is even more advanced. It allows for separation of distinct objects within a single class. So instead of assigning the class car to two cars in an image, it will label the two cars with car1 and car2.

Deep Learning ImageSegmentation

Neural Network architecture for semantic segmentation

Basic structure

The Encoder

A set of layers that extract features of an image through a sequence of progressively narrower and deeper filters. Removing the spatial knowledge, while focusing on the more salient features during the contraction.


A set of layers that progressively grows the output of the encoder into a segmentation mask resembling the pixel resolution of the input image.

Skip connections

Long range connections in the neural network that allow the model to draw on features at varying spatial scales to improve model accuracy.


The U-Net model architecture derives its name from its U shape. The encoder and decoder work together to extract salient features of an image in the contraction leaf and then use those features in the expansion path to determine which label should be given to each pixel. The encoder is made up of blocks that downscale an image into narrower feature layers using convolutional layers with a non linear activation function and a max-pooling layer. While the decoder mirrors those blocks in the opposite direction, upscaling its output to the original image size and ultimately predicting a label for each pixel. Skip connections cut across the U to improve performance.

The Dataset: KITTI segmentation

Im using the KITTI semantic segmentation Dataset. It consists of 200 semantically annotated train as well as 200 test images. Providing Groundtruth data for 34 different labels, like road, sidewalk etc. Since not all labels are evenly distributed across the training images, which will make it harder for the model to learn, I will skip some of the provided labels and only focus on 20 meaningful classes. 200 images is quite a low number for training a CNN. Therefore the data will be augmented with rotation, zooming and other image processing operations.

Furthermore I am not interested in taking part in any competition, therefore I am not using the provided dev-kit for scientific comparison. Instead I tried to use tf.Keras as much as possible.


For defining the success of our model we need to define the metric used. For this we will use the Intersection over Union (IoU) metric. Since we have a multi class issue, we will use the mean IoU over all classes. Lucky for us tf.keras already provides a tf.keras.metrics.MeanIoU implementation.

The model with tf.keras

Now we need to build the model for semantic segmentation with tf.keras’s Sequential API. It is basically just a concatenation of convolution layers with MaxPooling2d for the contraction leaf and UpSampling2d Layers for the expansion leaf. I’ll be using 3 Down and 3 Up layers, also resulting in 3 skip connections. More Layers will probably increase the overall performance for sure, but it will also increase the training time. The model as is already contains 7.7 Million trainable parameters.

Loss Function

Each pixel of the output of the network is compared with the corresponding pixel in the ground truth label image. We apply sparse_categorical_crossentropy loss on each pixel. Sparse because we are not hot encoding our categories but use the int ids directly.

def get_conv_layer(parent, filters):
    conv = Conv2D(filters, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(parent)
    conv = Conv2D(filters, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv)
    return conv 
def get_up_conv_layer(parent, connection, filters):
    up = Conv2D(filters, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(parent))    
    merge = concatenate([connection,up], axis = 3)
    conv = Conv2D(filters, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge)
    conv = Conv2D(filters, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv)
    return conv
def unet(number_output_channels, pretrained_weights = None, input_size = (256,256,3)):
    inputs = Input(input_size)
    conv1 = get_conv_layer(inputs, 64)
    pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
    conv2 = get_conv_layer(pool1, 128)
    pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
    conv3 = get_conv_layer(pool2, 256)    
    drop3 = Dropout(0.5)(conv3)
    pool3 = MaxPooling2D(pool_size=(2, 2))(drop3)
    conv4 = get_conv_layer(pool3, 512)    
    drop4 = Dropout(0.5)(conv4)    
    up3 = get_up_conv_layer(parent=drop4, connection=drop3, filters=256)
    up2 = get_up_conv_layer(parent=up3, connection=conv2, filters=128)
    up1 = get_up_conv_layer(parent=up2, connection=conv1, filters=64)
    conv_last = Conv2D(number_output_channels, 1, activation = 'softmax')(up1)
    model = Model(inputs = inputs, outputs = conv_last)
    model.compile(optimizer = Adam(learning_rate=1e-3), loss = 'sparse_categorical_crossentropy', metrics = ['accuracy', MaskMeanIoU(name='iou', num_classes=number_output_channels)])
    return model


After training the model on Google Colab with a GPU Instance for 10 Epochs. Taking about 4 hours. It was able to achieve an overall mean IoU of ~65% and a Validation IOU of ~25%. This is of course open for improvement, but I was aware that reducing the complexity of the model will result in suboptimal results.

Now lets see visual results, which look quite good imho.

Note: this is a test image that was never used during training/validation.

Look how good it segmented the road and sidewalk, also all cars and vegetation look proper.

Now I’ll take a look at how our model’s capabilities are generalized, by using a random image from the internet. Considering the amount of training and the simplicity of the model, i think it’s still really awesome!

Segmentation on Random Image not affiliated with Dataset images


In this article, I’ve familiarized myself with semantic segmentation with tf.keras. The trained neural network does what I wanted it to do, but there are many issues still. Most notably the bad Mean IOU values, which will need some more investigation and tweaking. For optimization, we could increase the number of contraction/expansion layers. What would also be interesting and would reduce training time is to use a pretrained Image Recognition Model like VGG16 or MobileNet for the contraction side and let the model “only” learn the classification task.

Also investigating other models for semantic segmentation with tf.keras like Pyramid Structures, Mask R-CNN and DeepLab will be very interesting. You can checkout the full python notebook on my github.

Tips and Tricks for python lists

Python is one of my preferred languages. It’s by default highly readable and tends to favor code brevity. These and all the awesome additional libraries out there are the reason it is so popular among programmers of all origins. In this short article I will cover some Tips and Tricks for python lists I’ve encountered during development with python.

A python list is an ordered and mutable collection. It allows to have duplicate members, in contrast to sets. Tuples are more or less the same as lists, but they can not be altered. You can find all the functionality of a list here.

So here are 5 basic tips and tricks I’ve been picking up regarding lists.

Tip and Trick 1: List comprehension

List comprehension provides a concise way to create lists. Take the rather lengthy example using a for loop.

products = ['apple', 'banana', 'grapes', 'pear']
products_with_p = []
for product in products:
    if 'p' in product:
# Output:
# ['apple', 'grapes', 'pear']

List comprehension allows us to transform this into a simple, yet readable, one liner. It is also possible to nest several list comprehensions, but be aware that it can get very unreadable quickly.

print([product for product in products if 'p' in product])
# Output:
# ['apple', 'grapes', 'pear']

Similar to list comprehensions: dictionary comprehensions also exist. Mind the slightly adapted syntax for the key-value pair that make up the list:

products_with_counts = {'apple': 5, 'banana': 7, 'grapes': 8, 'pear': 9}
products_count_above_7 = {k:v for (k,v) in products_with_counts.items() if v >= 7}
# Output:
# {'banana': 7, 'grapes': 8, 'pear': 9}

Tip and Trick 2: Advanced list iterations

Iterating a list (or any iterable) is very easy in python using for … in … syntax.

for element in range(4):
# Output:
# 0
# 1
# 2
# 3

But what to do if you also need the index of the element? enumerate() helps by providing the current index for you.

products = ['apple', 'banana', 'grapes', 'pear']
for c, value in enumerate(products, 1):
    print(c, value)
# Output:
# 0 apple
# 1 banana
# 2 grapes
# 3 pear

What about iterating over multiple lists simultaneously? Here comes zip(). It will stop iterating when the shortest iterable is exhausted.

products = ['apple', 'banana', 'grapes', 'pear']
product_count = [5, 7, 8, 9, 10]
for count, product in zip(product_count, products):
    print(count, product)
# Output:
# 5 apple
# 7 banana
# 8 grapes
# 9 pear

Tip and Trick 3: Joining lists inside a dictionary

Sometimes you need to combine several nested lists inside a dictionary. Instead of doing it manually you can use the combination of dict.values() and the built-in sum function. Which leverages the fact that lists can be concatenated by the + operator e.g. [1, 2] + [3, 4] = [1, 2, 3, 4]

simple_dict = {"a": [1,2], "b":[3,4]}
values = sum(simple_dict .values(), [])
# Output:
# [1, 2, 3, 4]

Tip and Trick 4: List to string

Often you need to be able to do the reverse of a tokenization, so that you join a list of objects back into a single item. This can be done quite easily in python using the built-in function join(). Here is an example:

tokens = ["Put", "me", "back", "together"]
print(" ".join(tokens))
# Output:
# "Put me back together"

Tip and Trick 5: Map and filter with lists

Map and filter are functions known from functional programming and allow us to apply functions to lists in a more concise way. Let’s dig into it.

product_prizes = [2, 1, 3, 1.5]
double_prices = list(map(lambda x: x**2, product_prizes))

prices_less_than_2 = list(filter(lambda x: x < 2, product_prizes))
# Output:
# [4, 1, 9, 2.25]
# [1, 1.5]


I hope those Tips and Tricks for python lists helps you to use lists much more effectively in the future!

Build a spotify connect enabled stereo-speaker using raspotify


For a long time I have been trying to figure out how to bring a little music to my kitchen. Since most of the products you can buy that allow for spotify connect are rather pricey and often times offer more connected services like AirPlay which I have no need for.
So I’ve decided to build a spotify connect enabled stereo-speaker on my own. Fortunately the software part is really easy: we can just use awesome raspotify!

For the hardware I’m using:

  • Raspberry Pi Zero W
  • Hifiberry MiniAMP (3W stereo amplifier)
  • two 2.5″ 4 Ohm full-range speakers (e.g. from Visaton)

Setting up Raspbian on the Pi Zero

Since the radio will be a headless device we can use the non desktop lite version (Raspbian Download). Use balenaEtcher or a similar tool to flash the image to the SD card. Also don’t forget to setup your Wifi credentials in the boot/wpa_supplicant.conf. Assigning a static IP to your radio’s MAC on your router will also help for easier access. Additionally you’ll need to setup SSH on the Pi by creating an empty ssh file in the root of the SD card.

Adapting the config for the MiniAmp

To inform linux about the hifiberry miniamp we need to add it to the config.txt

Configure device tree overlay file

Set the hifiberry-dac as a dt overlay, for activating the device. See the knowledge base for more information. You also need to uncomment the default audio.


Set up software volume control

After booting the Pi, the speakers connected to the MiniAmp should be able to play a sound.

You can test it with:

$ speaker-test -c 2

Alas! It works! But be warned at this point you can not change the volume you need to add a software volume control first.

Add the following to etc/asound.conf file:

pcm.hifiberry {     
type softvol
slave.pcm "plughw:0" "Master"
control.card 0

pcm.!default {
type plug
slave.pcm "hifiberry"

And now use the device with speaker-test:

speaker-test -D hifiberry -c 2

After that you have a new ‘Master’ control in the alsamixer.

Setting up Spotify Connect

Setting up spofity connect is pretty straight forward thanks to raspotify.
Simply do the following via SSH on your pi:

curl -sL | sh

After it is finished adapt /etc/default/raspotify with your spotify credentials and other configuration.
My config file looks something like this:

#/etc/default/raspotify -- Arguments/configuration for librespot
OPTIONS="--username $SPOTIFY_USER$ --password $SPOTIFY_PASSWD$"
VOLUME_ARGS="--enable-volume-normalisation --linear-volume --initial-volume=50"
BACKEND_ARGS="--backend alsa"

Remember to replace $SPOTIFY_USER$ and $SPOTIFY_PASSWD$ with your own credentials. After restarting the daemon by running: 
sudo systemctl restart raspotify.

You should see a KitchenRadio device under the spotify connect devices, indicating that we are now more or less finished with building a spotify connect enabled stereo-speaker.

Improving boot time

The last step is optional but highly recommended. By default Rasbpian waits for network while booting. You can check the spent boot times with:
$ systemd-analyze blame
In my case it nearly waited 1 minute for the network. It’s fairly easy to reduce it by:

Create a file /etc/systemd/system/networking.service.d/reduce-timeout.conf with following content:


That reduced my boot up time to around 30-40s. Which is by far more acceptable than 1,5 minutes. Boot time of course also varies depending on the SD Card you are using. So a higher SD Card class will also improve the boot time.


In this article I showed you how to build a small spotify connect enabled stereo speaker using a raspberry pi.
There are some things that still need to be addressed like adding a button for turning the radio on and off. Or a proper enclosure to put everything in neatly.