Reinforcement Learning

Our totally wasted grid world agent calls us and asks if we can safely navigate him home. We use the WhatsApp location sharing feature to determine the agent location. Our goal is to message the agent a policy plan which he can use to find his way back home. Like every drunk guy, the agent does noisy movements: he follows our instructions only in 80% of the time. In 10% of the time, he stumbles WEST instead of NORTH and in the other 10%, he stumbles EAST instead of NORTH. Similar behavior occurs for the other instructions. If he stumbles against a wall, he enters the current grid cell again. He gets more tired of every further step he takes. If the police catch him, he has to pay a fee and is afterward much more tired. On the way back home it could happen, that he gets some troubles with the “hombre malo”, and in his condition, he is not prepared for such situations. He can only rest if he comes home or meets the “hombre malo”. We want to navigate him as optimal as possible.

Wasted agent in a grid world. The agent can move NORTH, EAST, SOUTH, and WEST. The terminal states are \(s_{(1,2)}\) and \(s_{(3,3)}\). If the agent moves against the outer wall, he enters the previous state again. For each action which does not lead to state \(s_{(1,2)}\), \(s_{(3,0)}\), and \(s_{(3,3)}\) he receives always a negative reward of \(-0.1\). AVAILABLE AS OPEN AI GYM.

Markov Decision Process

Because of the noisy movement, we call our problem a non-deterministic search problem. We have to do 3 things to guide our agent safely home: First, we need a simulation of the agent and the grid world. Then we apply a Markov Decision Process (MDP) on this simulation to create an optimal policy plan for our agent which we will finally send via WhatsApp to him. An MDP is defined by a set of states \(s \in S\). This set \(S\) contains every possible state of the grid world. Our simulated agent can choose an action \(a\) from a set of actions \(A\) which changes the state \(s\) of the grid world. In our scenario he can choose the actions \( A = \{NORTH, EAST, SOUTH, WEST\}\). A transition function \(T(s,a,s’)\) yields the probability that \(a\) from \(s\) leads to \(s’\). A reward function \(R(s,a,s’)\) rewards every action \(a\) taken from \(s\) to \(s’\). In our case, we use a negative reward of \(-0.1\) which is also referred to \(\textit{living penalty}\) (every step hurts). We name the initial state of the agent \(s_{init}\) and every state which leads to an end of the simulation terminal state. There are 2 terminal states in our grid world. One with a negative reward of \(-1\) in \((1,2)\) and one with a positive reward of \(1\) in \((3,3)\).

Solving Markov Decision Processes

Our goal is to guide our agent from the initial state \(s_{init}\) to the terminal state \(s_{home}\). To make sure that our agent does not arrive too tired at home, we should always try to guide the agent to the nearest grid cell with the highest expected reward of \(V^{*}(s)\). The expected reward tells us how tired the agent will be in the terminal state. The optimal expected reward is marked with a *. To calculate the optimal expected reward \(V_{k+1}^*(s)\) for every state iteratively, we use the Bellman equation:

\(V^*_{k+1}(s) = \max_{a} Q^{*}(s,a)\)

\(Q^{*}(s,a) =\sum_{s’}^{}T(s,a,s’)[R(s,a,s’) + \gamma V^{*}_{k}(s’)]\\\) \(V_{k+1}^*(s) = \max_{a} \sum_{s’}^{}T(s,a,s’)[R(s,a,s’) + \gamma V_{k}^{*}(s’)]\\\)

\(Q^{*}(s, a)\) is the expected reward starting out having taken action \(a\) from state \(s\) and thereafter acting optimally. To calculate the expected rewards \(V^{*}_{k+1}(s)\), we set for every state \(s\) the optimal expected reward \(V_{0}^{*}(s) = 0\) and use the equation.

As soon as our calculations convergent for every state, we can extract an optimal policy \(\pi^{*}(s)\) out of it. An optimal policy \(\pi^(s)\) tells us the optimal action for a given state \(s\). To calculate the optimal policy \(\pi^{*}(s)\) for a given state \(s\), we use:

\(\pi^{*}(s) \leftarrow arg\max_{a} \sum_{s’}T(s,a,s’)[R(s,a,s’) + \gamma V_k^{*}(s’)]\)

After we calculated the optimal policy \(\pi^{*}(s)\) we draw a map of the grid world and put in every cell an arrow which shows the direction of the optimal way. Afterward, we send this plan to our agent.

This whole procedure is called offline planning because we know all details of the MDP. But what if we don’t know the exact probability distribution of our agent stumbling behavior and how tired he gets every step? In this case we don’t know the reward function \(R(s,a,s’)\) and the transition function \(T(s,a,s’)\)? We call this situation online planning. In this case, we have to learn the missing details by making a lot of test runs with our simulated agent before we can tell the agent an approximately optimal policy \(\pi^*\). This is called reinforcement learning.

Reinforcement Learning

One day later our drunk agent calls us again and he is actually in the same state ;) as the last time. This time he is so wasted that he even can not calculate the reward function \(R(s,a,s’)\) and the transition function \(T(s,a,s’)\). Because of the unknown probabilities and reward function, the old policy is outdated. So the only way to get an approximately optimal policy \(\pi^{*}\) is to do online planning which is also referred to reinforcement learning. So this time we have to run multiple episodes in our simulation to estimate the reward function \(R(s,a,s’)\) and the transition function \(T(s,a,s’)\). Each episode is a run from the initial state of the agent to one of the terminate states. Our old approach of calculating \(\phi^{*}\) would take too long and it doesn’t convergent smoothly in the case of online training. A better approach is to do Q-Learning:

\(Q_{k+1}(s, a) \leftarrow \sum_{s’} T(s,a,s’) [R(s,a,s’) + \gamma \max_{a’} Q_k^{*}(s’, a’)]\\\)

Q-Learning is a sample-based Q-value iteration To calculate the unknown transition function \(T(s, a, s’)\) we empirically calculate the probability distribution of the transition function.

This time we choose our policy based on the q-values of each state and update the \(Q(s, a)\) on the fly. The problem with this policy is, that it exploits always the best \(Q(s, a)\) which is probably not the optimal policy. To find better policies, we introduce a randomization/exploration factor to our policy. This forces the agent to follow sometimes a random instruction from us which leads to a better policy \(\pi^{*}\). This approach is referred to as \(\epsilon\)-greedy learning. After our \(Q(s, a)\) calculations converging, we can send the optimal policy to our agent.

Till now, our agent was lucky and he lived in a stationary grid world (all objects besides the agent are static). But as soon as the other objects start moving the grid world gets more complex and the number of states in the set \(S\) grows enormously. This could lead to a situation where it is not possible to calculate all the \(Q(s, a)\) anymore. In such situations, we have to approximate the \(Q(s, a)\) function. One way to approximate \(Q(s,a)\) functions is to use deep reinforcement learning.

References & Sources

  • Haili Song, C-C Liu, Jacques Lawarrée, and Robert W Dahlgren. Optimal electricity supply bidding by Markov decision process.IEEE transactions on power systems, 15(2):618–624, 2000.
  • Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore. Reinforcement learning: A survey.Journal of artificial intelligence research, 4:237–285, 1996.
  • Christopher JCH Watkins and Peter Dayan. Q-learning.Machine learning, 8(3-4):279–292, 1992.
  • Drunk guy image from
  • Hombre malo image from
  • Home image from
  • Police image from

Autonomous Organizations

In a world where machines are increasingly replacing the employment of people, we have to create new lifestyles. Passive income is an excellent way to earn money in such a world. It is derived from a rental property, limited partnership, or another enterprise in which a person is not actively involved. Nowadays, these organizations are mostly managed by public corporations. But soon, more and more autonomous organizations will arise because of the rapid development of artificial intelligence and the blockchain technology.

Based on this development, it could be possible to develop whole companies that work autonomously for people who don’t want to or can’t work anymore. So it could be a new business model for engineers to develop such companies for costumers.

But how could such an organization work internally?

Current neural networks can already evaluate the aesthetics of an image, so it will not take long until neural networks can also evaluate the appeal, usefulness, and other factors of a product. A combination of

  1. Optimization Algorithms
  2. Artificial Intelligence
  3. Simulations
  4. APIs to marketplaces like Amazon

could be enough to run such organizations autonomously. Figure 1 illustrates an Autonomous Organization Framework. The holder inputs a number of assets and configures what the autonomous organization should do. The AO calibrates itself; then, it interfaces with Google Trends to analyze markets. Its analysis could find out that it is currently a profitable moment to sell remote-controlled cars. It could then use social media platforms to find popular models, and search other websites for the toy construction plans. After that, it could access online shops to extract the needed components, simulating many different combinations (optimizing via evolutionary algorithms, evaluating the product via deep learning and, finally, forecasting) to build a competitive RC-car. After constructing the product, it could be sold via the Amazon API.

Figure 1: A Framework for an autonomous organization

Besides such manufactory organizations, it would be probably easier to implement services like e-learning schools first. For example, an online school that teaches costumers how to dance salsa or bachata. From one of my previous posts, you already know how easy it is to develop a neural network that can differentiate between-song genres. It would also be possible to develop a classifier that can be detected if a person is dancing salsa or not.

I really prefer to have especially in social environments people instead of machines in control. But what is with my previous post about language table organizations? It would be much work to automate every single step of such an organization, but it could still be possible.

* This whole framework is for now just a general idea. Such organizations work probably better in a field where the competitive environment is not changing too much.

Storytelling for AI Development

Around five years ago, Amazon released its smart assistant named Amazon Alexa. She has gained a lot of popularity among voice assistants. Most users use her to play music and search the internet for information. But these are just fun tasks. Alexa can help you with more of the daily tasks and make your life easier.

Amazon Alexa tries to be your assistant, but someone who reacts to commands doesn’t show any interpersonal relationship. This type of connection is essential for an assistant job – it builds respect and lets us see assistants like ‘individuals.’ This is one goal that AI developers want to achieve: Artificial Intelligence that is human-like. In this blog post, I want to discuss one possibility of achieving this goal.

Building an Interpersonal Relationship with Storytelling

A while ago, I read “Story” by Robert McKee. This book inspired me to think about human-like AI development a little bit differently.

Robert McKee describes in his books that a story is a metaphor for life – stories awaken feelings. When we watch good movies, we try to adopt the personal perspective of the movie characters. This leads to an ever-stronger bond with the characters in the course of the story.

This is exactly something that we want to achieve for our users and the virtual assistants. The field of storytelling reminds us that there is no way to love or hate a person if you have not heard that person’s story; that’s why our virtual assistant needs to have a backstory. As shown in the movie example, it makes no difference whether the person is real or not.

Often, a connection with movie characters arises when they make decisions in stressful situations. The real genius of a person manifests itself in the choices they make under pressure. The higher the pressure, the more the decision reflects the innermost nature of the figure. Stressful situations often appear for virtual assistants, such as when users ask for unanswerable questions. In this situation, ordinary virtual assistants answer with standard answers like, “I don’t understand,” or “I can’t help you.” In this case, the real character of the virtual assistant is revealed. AI developers have to avoid such standard answers, and instead, the artificial intelligence should answer such problematic user-questions with a backstory. The backstory depends on the use case but should be created by somebody familiar with storytelling and dialog writing. For dialog writing, I can recommend another book from Robert McKee called “DIALOG.”

According to the dramatist Jean Anouilh, fiction gives life a form. Thus, stories support the development of artificial life, making virtual assistants more than just empty shells.

German vs Dutch Carnival Songs

Soon its time for carnival! I have noticed that a lot of German hits are played in Dutch pubs. Because of the similarity of these two languages, I have got often asked whether it is a German or Dutch song. Famous Dutch and German songs are often translated into the other language. This results in an exciting question of how a neural network distinguishes these two music genres. Because it can not directly learn from the music rhythm but must analyze the words inside the song, so it has to filter out the rhythm somehow.

Collect Data

In this little experiment, I wanted to use songs, which are available in both languages. So my first task was to collect these songs. I ended up with the following table.

Dutch SongGerman Song
Het VliegerliedDas Fliegerlied
Schatje mag ik je foto!Schatzi schenk mir ein Foto
Alleen maar schoenen aan*Sie hatte nur noch Schuhe an
Drank & DrugsStoff & Schnaps
Viva HollandiaViva Colonia
Mama LaudaaaMama Laudaaa
Ga es bier halenGeh mal Bier Holen
Liever Te Dik In De KistMich hat ein Engel geküsst
* favorite

Data Preprocessing

I split each song into 3 seconds chunks and converted each chunk into a spectrogram image. These images can be used to identify spoken words phonetically. Each spectrogram has got two geometric dimensions. One axis represents time, and the other axis represents frequency. Additionally, a third dimension indicates the amplitude of a particular frequency at a particular time is represented by the intensity or color of each point in the image. I ended up with 2340 training images (1170 images for each class) and 300 validation images (150 images for each class). For the preprocessing step I used my jupyter notebook.

Example Spectrogram of the German song Mama Laudaaa

Neural Network

I use transfer learning to mainly decrease the running time of the training phase. I chose the RESNET with 18 layers. You can find my code in my GitHub-Repository.


I achieved a training accuracy of 83% and a validation accuracy of 80% in 24 epochs. A long time ago I developed already a music genre classification for bachata and salsa, here I reached a training accuracy of 90% and a validation accuracy of 87% as far as I remember.

Language Exchanges – Random Optimization in Practice

Are you interested in languages and meeting people from other countries? Language exchange can be the best place to do so! Here you meet new friends, discover other cultures, practice your favorite language, and help others who are learning your language!

My last visit at the AvaLingua Exchange meeting in Nijmegen. Here I learned for 1 hour Dutch and taught 1 hour German.

Most of these meetings are organized at regular intervals by volunteers. One of the most time-consuming tasks is to organize the learning groups. To support these dedicated volunteers I developed an optimizer that can organize the groups on its own. This could save a lot of working time.

Random Optimizer

During the last weeks, I had not much time, and so I wanted an optimization method that leads fast to the first results without thinking much about the formula. Plus I wanted to know how random optimizers perform under real-life conditions. For this purpose, I chose a random optimizer.

SPOILER-ALARM: there is no guarantee a random optimizer will find the optimal solution to a given problem. So this prototype should mainly deliver benchmarks for upcoming optimizers.

Optimization Process

Our random optimizer should maximize the number of participants in a language exchange event. For this, we will input the language exchange registrations into the program. Based on the participants it will organize groups so that every participant learns in a 2-hour event at least 1 hour. Preferably he should also teach for 1 hour his own language. I also added the following constraints:

  • Each group needs at least 1 native/advanced teacher
  • In each group, only 1-3 students are allowed
  • Participants have to be busy for 2 hours
  • No participant should teach 2 hours

If a participant is not assigned, he will be denied for this event.


You can find like always the code and a dummy registration file in my GitHub-Repository. In the following picture, you can see the overall UML-diagram of the project. The ResultManager uses the ParticipantBuilder to build participants from the registration file. The TableBuilder builds tables and assigns participants to these tables. The TableEvaluator evaluates the TableBuilder result and saves it. The ResultManager takes care of the best generate result. The random optimization occurs while building the tables and assigning the participants to the tables. The overall process stops after 1000 iterations.

Class diagram of the overall project.


Random Optimization is a method to develop models very fast. This helps us to understand the overall problem much more in detail. When we run this random optimizer on a registration file with 29 participants, it produces an average result with 12 participants (10 times 1000 iterations). This is actually a quite poor result, but now we have got a benchmark for other algorithms, which I will implement as soon I have got more time. So check out my next posts!

Build your optimal circuit training workout via linear programming

The days are getting longer and it is time to prepare for summer again. All the months spent indoors, waiting for the warm weather, eating cookies, and drinking hot chocolate could make you nervous going back to the beach. Imagine a turbocharged workout routine that mixes cardio and strength training and let you train everywhere. Plus, you don’t even need expensive equipment and it’s easily customized to help you to get your best Beach Body ever. Sound too good to be true? It’s not! It’s called circuit training.

Scheveningen, Netherlands

Circuit Training

Circuit training lets you alternate between 4 to 10 exercises that target different muscle groups. The whole idea is to train different muscles all at the same time in a minimum amount of rest. Usually, you train between 15 and 40 minutes. During this time, you will make multiple circuits, with each course containing all exercises. It is a good idea to make 2-3 minute breaks per circuit, 30-90 seconds per exercise, and a 3-5 minute warm-up. The exercises should be arranged so that different muscle groups are trained alternately.

First tries to make a human flag, Germany

To create your perfect circuit training routine you could hire a personal trainer. But let’s be honest: trainers are very expensive. For this task, we will just replace the trainer with an artificial intelligence agent. The agent will receive our training goals (e.g. a bigger biceps or a six-pack) as input and then tell us our personalized circuit training routine.

Linear Programming

To solve this kind of optimization problem we will use linear programming. Linear programming finds an optimum of a linear function that is subjected to various constraints. You can model a lot of problems as linear functions, finding the perfect circuit training is one of them. To solve our problem we have to do the following steps of the optimization workflow:

  1. Identify the exact problem
  2. Model the problem
  3. Choose a tool to deal with the model
  4. Retrieve the solution
  5. Analysis
One of the first applications of linear programming was in the second world war. But it can also be used for diet optimization, in sports and much more.

In our case, we want to build a circuit training that is as compact as possible and trains all of our muscles. So we want to build a linear function/objective function that describes the total amount of time for the workout. We want to optimize this objective function by modifying the design variables \(x_{i} \in \{0,1\}\).

minimize: \( \sum_{i=1}^{n} t_i \cdot x_{i} \)

The design variable \(x_{i} \in \{0,1\}\) is an exercise and the given coefficents \(t_i\) is the needed time for exercise \(x_i\). \(n\) is the number of exercises.

Each exercise has got its own intensity per muscle group. So pull-ups train for example the biceps, the back and also a little bit abs. But how can we measure the intensity? I define intensity as a value between 0 and 1. 0 means it doesn’t train the specific muscle group at all. 1 means it totally trains the specific muscle group and you don’t need to train this muscle group anymore in the current circle. So I collected a bunch of exercises and defined for each muscle group its intensity and saved it into a CSV-file as a table:

pull ups0.51010.50.5000
push ups00100.50.25000

I recommend that you define your own intensities for each exercise. Currently, these intensity values are a little bit relative, if you know a better measurement unit please let me know.

So now we can choose for every circle training how much we want to train at least the specific muscle group. The only thing which is left for our LP-solver is to define these intensities as constraints so that the LP-solver takes the intensities into account. For that reason we create for each muscle group a constraint:

\( \sum_{i=1}^{n} y_i \cdot x_{i} \ge intensity_{shoulders}\)

\(y_i \in [0,1]\) is the intensity for exercise \(x_i\).


To solve LP-problems you can use different kinds of LP-solvers. A very nice Open Source LP-solver is PULP. I personally prefer the Gurobi LP-solver, it is a very powerful solver that has got a huge variety of features. If you are in academic you can receive an academic license for free. You can find the full code in my GitHub-Repository.

if __name__ == "__main__":
    # Define your intensities manually
    categories, min_intensity, max_intensity = gp.multidict({
        'shoulders' : [0,GRB.INFINITY],
        'back' : [0,GRB.INFINITY],
        'breast' : [0,GRB.INFINITY],
        'biceps' : [0,GRB.INFINITY],
        'triceps' : [0,GRB.INFINITY],
        'abs' : [0,GRB.INFINITY],
        'butt' : [0.5,0.5],
        'legs' : [2,2]

    # Read exercises and their intensity
    exercises_intensities = build_dict_from_csv_file("exercises.csv", "name")
    # Read exercises and their needed time
    exercises, time = gp.multidict(exercise_and_time_dict(exercises_intensities))
    # Build model
    m = gp.Model("circle_training")
    # Create trainings variables (each exercise is a decision variable)
    training = m.addVars(exercises, vtype=GRB.BINARY, name="training")
    # Objective Function
    m.setObjective(, GRB.MINIMIZE)
    # Constraint:
    # shoulders * push ups + shoulders * biceps + ...
    # others * push ups + ....
    m.addConstrs((gp.quicksum(exercises_intensities[e, c] * training[e] for e in exercises)
        == [min_intensity[c], max_intensity[c]]
        for c in categories), "_"
    # Find Solution
    # if there is a solution
    if m.status == GRB.OPTIMAL:
        print("Your training plan:")
        print('\nTime: %g' % m.objVal)
        trainingx = m.getAttr('x', training)
        for f in exercises:
            if trainingx[f] > 0:
                print('%s %g' % (f, trainingx[f]))
        print('No solution')


If you want to learn more:

Optimize your diet via SMT

Are you struggling with what to eat today? Sometimes it is tough to decide, especially if you are hungry and find out there is nothing left in your fridge. I want to show you how it is possible to develop your dish planner and let the computer automatically optimizes your meals according to your daily nutrition intake.

For this program, we use SMT solvers. SMT solvers try to figure out which variable assignment satisfies a given logic formula. If a logic formula is satisfiable, it returns the assigned variables (in our case, your meals). PySMT is a very user-friendly interface for Python developers to encode and solve these kinds of SMT problems. In this post, I give you a small introduction into the field of Satisfiability Modulo Theories by providing you an SMT solver for figuring out what you could eat today.

Ethiopian cuisine


To solve this problem we have to do the following steps:

  1. Create a data set with your favorite dishes (dish name, number of calories, etc. etc.)
  2. Research your daily nutrition intake
  3. Encode a propositional logic formula based on this data set
  4. Let the SMT-solver decide what you should eat

1. Your personalized dish data set

To get your personalized data set, you must figure out how much nutritions are in your dishes. Thanks to the internet there are a variety of websites that provide you with a lot of recipes and the number of nutritions. My favorite ones are:

If you don’t have time to create your data set, I prepared a data set for you. There are many tastes out there, and so I just looked for the most famous food in the world. I ended up with a subset of dishes from CNN travel, and I hope you like it (a lot of them I didn’t know before).

2. Your daily nutrition intake

So how many calories should you eat today? Well, if you’re like me and have got already eaten 3 dishes, much chocolate and haven’t done any sport today, you should probably eat anything. But OK, let’s say the SMT solver should tell you how much you should eat tomorrow. For finding out the exact number of calories you can google for something like “calories calculator” and you may end up on:

On these websites, you can easily calculate your needed number of calories.

3. Formula Encoding

So how can we tell our overweight/optimization problem to our computer? Just with these 4 simple constraints:

(I) \(\sum_{i=1}^{|D|} x_i \cdot d_{i2} \leq (C + \alpha)\\\)

(II) \(\sum_{i=1}^{|D|} x_i \cdot d_{i2} \leq (C – \alpha)\\\)

(III) \(\sum_{i=1}^{|D|} x_i = k\)

(IV) \(\bigwedge\limits_{i}^E x_i = 0\)

\( D \) is a set of dishes. \( d_i \in D \) is a tuple \((dish, calories)\) and represents a single dish \(i\).

\(x \in \{ 0, 1 \}\) is an SMT variable that gets automatically assigned by the SMT solver. If \(x_i = 1\), then we will eat this dish the next day. \(k \in \mathbb{N}\) is the number of dishes per day.

\( C \in \mathbb{R} \) is the allowed number of daily calories. \( \alpha \in \mathbb{R} \) is the allowed deviation of \(C\).

\(B\subseteq \mathbb{N}\) is a set of disabled dish indices for the next calculation.

(I) and (II) make sure that our dishes give us the predefined number of calories. (III) makes sure that we get \(k\) dishes per day. (IV) allows us to exclude specific dishes for the next day.

4. Coding

You can find the full code in my GitHub-Repository. This method builds from lines 15 – 53, all the constraints from Section 3. The SMT solving happens at line 55. If a solution is found, it returns the dishes for cooking.

def nutrition_calculator(dishes, number_of_calories, number_of_dishes, disabled_dishes = [], alpha = 100):
        This method creates a dish plan based on the daily allowed calories
        and the number of dishes.
            dishes, pandas data frame. Each row looks like this: (dish name, calories, proteins).
            number_of_calories, the daily number of calories.
            number_of_dishes, the daily number of dishes.
            disabled_dishes, list of row indizes from dishes which are not allowed in the current calculation.
            alpha, the allowed deviation of calories
            row indizes from dishes data frame. These dishes can be eaten today.
    # Upper and lower calory boundaries
    calories_upper_boundary = Int(int(number_of_calories + alpha))
    calories_lower_boundary = Int(int(number_of_calories - alpha))
    # List of SMT variables
    x = []
    # List of all x_i * d_i2
    x_times_calories = []
    # List of all x_i € {0,1}
    x_zero_or_one = []
    # List of disabled dishes
    x_disabled_sum = []
    for index, row in dishes.iterrows():
        x.append(Symbol('x' + str(index), INT))
        # x_i * d_i2
        x_times_calories.append(Times(x[-1], Int(row.calories)))
        # x_i € {0,1}
        x_zero_or_one.append(Or(Equals(x[-1], Int(0)), Equals(x[-1], Int(1))))
        # Disable potential dishes
        if index in disabled_dishes:
    x_times_calories_sum = Plus(x_times_calories)
    x_sum = Plus(x)
    if len(x_disabled_sum) == 0:
        x_disabled_sum = Int(0)
        x_disabled_sum = Plus(x_disabled_sum)
    formula = And(
            # Makes sure that our calories are above the lower boundary
            GE(x_times_calories_sum, calories_lower_boundary),
            # Makes sure that our calories are below the upper boundary
            LE(x_times_calories_sum, calories_upper_boundary),
            # Makes sure that we get number_of_dishes dishes per day
            Equals(x_sum, Int(int(number_of_dishes))),
            # Makes sure that we don't use the disabled dishes
            Equals(x_disabled_sum, Int(0)),
            # Makes sure that each dish is maximal used once
    # SMT solving
    model = get_model(formula)
    # Get indizes
    if model:
        result_indizes = []
        for i in range(len(x)):
            if(model.get_py_value(x[i]) == 1):
        return result_indizes
        return None

Extensions & More

We can easily add two further constraints to support e.g. proteins in our encoding:

(V) \(\sum_{i=1}^{|D|} x_i \cdot d_{i3} \leq (P + \beta)\\\)

(VI) \(\sum_{i=1}^{|D|} x_i \cdot d_{i3} \leq (P – \beta)\\\)

\( P \in \mathbb{R} \) is the allowed number of proteins. \( \beta \in \mathbb{R} \) is the allowed deviation of \(P\). Don’t forget to update our set of dishes \(D\) and extend our tuple \(d_i\) to \((dish, calories, proteins)\).

If you want to learn more about SMT solving, check out the following links: