1CC2000 - Algorithmique et Complexité - TD07 Practice

TD07 - Knapsack

Implementation in Python 3 of different algorithms for the resolution of the Knapsack problem studied in TD 7.

Exercise plan :

Backtracking
Greedy
Dynamic programming

Each code you need to complete is followed by tests to verify that the implemented algorithm is correct. In the following cell, two Knapsack instances are loaded for testing. We use Python dictionaries to represent objects having weights and values.

# a small instance of Knapsack with 4 objects 
W4 = 5                                                    # weight limit of the knapsack
O4 = { 'o1': {'w': 1, 'v': 10}, 'o2': {'w': 2, 'v': 20},       # 'w' for weight and 'v' for value
       'o3': {'w': 3, 'v': 15}, 'o4': {'w': 4, 'v': 20} }


# a bigger instance of  Knapsack with 15 objects
W15 = 750                             
O15 = { 'o1':  {'w': 70,  'v': 135}, 'o2':  {'w': 73,  'v': 139}, 'o3':  {'w': 77,  'v': 149}, 
        'o4':  {'w': 80,  'v': 150}, 'o5':  {'w': 82,  'v': 156}, 'o6':  {'w': 87,  'v': 163}, 
        'o7':  {'w': 90,  'v': 173}, 'o8':  {'w': 94,  'v': 184}, 'o9':  {'w': 98,  'v': 192}, 
        'o10': {'w': 106, 'v': 201}, 'o11': {'w': 110, 'v': 210}, 'o12': {'w': 113, 'v': 214}, 
        'o13': {'w': 115, 'v': 221}, 'o14': {'w': 118, 'v': 229}, 'o15': {'w': 120, 'v': 240} }

Back tracking

We recall the common code skeleton of a backtracking algorithm to minimize a score, as presented in the last lecture:

bestScore, bestSol = Inf, None
def backtracking(curParSol) :
    if terminal(curParSol) :
        if score(curParSol) < bestScore :
            bestSol, bestScore = curParSol, score(curParSol)
    else :
        for childParSol in children(curParSol):
            backtracking(childParSol)

It explores the solutions space and keeps in memory the best found solution. The previous code is a recursive depth-first search applied to the solutions tree. Here curParSol stands for current partial solution.

In our case (see the tree's figure in TD7), each node of the tree is a partial solution (under construction) of the Knapsack problem.

Partial solution data structure

To start we need to fix an order on objects and index them from {$0$} to {$n-1$} within a Python list that we call objs_list. For example we can consider a lexicographical order where objs_list = ['o1', 'o2', ..., 'on']

Then, we will represent a partial solution as a Python dictionary with 4 entries:

'selected' : a Python set of currently selected objects
'index' : the index of the next object to select or not to select within the list objs_list
'weight' : the total weight of the selected objects
'score' : the total value of the selected objects

One partial solution of the knapsack instance (O15 ,W15) is:

exampleParSol = {'selected': {'o1', 'o3', 'o4'} , 'index':6, 'weight': 227, 'score': 434}

and the root of the solutions tree is:

rootSol = {'selected': set() , 'index': 0, 'weight': 0, 'score': 0} %%

Implementation

To complete the following backtracking code, write the children function to compute the list of partial solutions that are children of the current partial solution given as parameter.


def backtracking(O_dict, W) :    
    global bestSol, objs_list

    # initialization
    bestSol = {'selected': set() , 'weight': 0, 'score': 0}
    objs_list = sorted(O_dict.keys()) # lexicographical order on objects

    # computes the children of the current partial solution
    def children(curParSol):
        childrenSols = []
        ############### TODO : complete code ####################        


        return childrenSols

    # a partial solution is terminal when it has no children
    def terminal(curParSol):
        return len(children(curParSol)) == 0

    def backtracking_rec(curParSol) :
        global objs_list, bestSol

        if terminal(curParSol) :
            if curParSol['score'] > bestSol['score'] :
                bestSol = curParSol.copy()
        else :
            for parSol in children(curParSol):
                backtracking_rec(parSol)

    # call backtracking on the root node
    rootSol = {'selected': set() , 'index': 0, 'weight': 0, 'score': 0}
    backtracking_rec(rootSol)

    # return the best found solution (the index entry is no more relevent)
    bestSol.pop('index')
    return bestSol

print('--- BACKTRACKING ---')

bestSol4 = backtracking(O4, W4)
print('--- simple case : 4 objects --- best found solution {} of value {}'.format(bestSol4['selected'], bestSol4['score']))
assert bestSol4['score'] == 35
assert bestSol4['selected'] == {'o2', 'o3'}

bestSol15 = backtracking(O15, W15)
print('--- bigger case : 15 objects --- best found solution {} of value {}'.format(bestSol15['selected'], bestSol15['score']))
assert bestSol15['score'] == 1458
assert bestSol15['selected'] == {'o1', 'o3', 'o5', 'o7', 'o8', 'o9', 'o14', 'o15'}

A partial solution has at most two children
A partial solution where the 'index' is equal to {$n$} is terminal and does not have any children
Do not consider a child solution when it exceeds the weight limit!
If you copy a partial solution, pay attention to use a deep copy:

dic_orig = {'selected': {'o1', 'o3', 'o4'} , 'index':6, 'weight': 227, 'score': 434}
dic_copy = dic_orig.copy()
assert id(dic_orig) != id(dic_copy)                          # the dictionaries are different
assert id(dic_orig['selected']) == id(dic_copy['selected'])     # but the inner set is the same!
dic_copy['selected'].add('o7')
assert 'o7' in dic_orig['selected']                             # not expected!

dic_copy['selected'] = dic_orig['selected'].copy()              # copy the inner set too
assert id(dic_orig['selected']) != id(dic_copy['selected'])     # now inner sets are different!
dic_copy['selected'].add('o8')
assert 'o8' not in dic_orig['selected']                         # as expected!

from copy import deepcopy                                   # or use a deepcopy
dic_deepcopy = deepcopy(dic_orig)
assert id(dic_orig) != id(dic_deepcopy)                     # the dictionaries are different
assert id(dic_orig['selected']) != id(dic_deepcopy['selected']) # the inner sets are different too!
dic_deepcopy['selected'].add('o9')
assert 'o9' not in dic_orig['selected']                         # as expected!

Greedy algorithm

A greedy algorithm constructs a solution step by step without going back on decisions already taken. There is no guaranty of optimality for the solution.

For the Knapsack problem, an efficient greedy algorithm is studied and established in TD7.

Implementation

Complete the following code for greedy algorithm.

def greedy(O_dict, W):

    # initialization
    n = len(O_dict)
    sol = {'selected': set() , 'weight': 0, 'score': 0}

    ############### TODO : complete code ####################        


    return sol


print("--- GREEDY ---")
bestSol4 = greedy(O4, W4)
print('--- simple case : 4 objects --- best found solution {} of value {}'.format(bestSol4['selected'], bestSol4['score']))
assert bestSol4['score'] == 30
assert bestSol4['selected'] == {'o2', 'o1'}

bestSol15 = greedy(O15, W15)
print('--- bigger case : 15 objects --- best found solution {} of value {}'.format(bestSol15['selected'], bestSol15['score']))
assert bestSol15['score'] == 1441
assert bestSol15['selected'] == {'o1', 'o2', 'o3', 'o7', 'o8', 'o9', 'o14', 'o15'}

Benchmark

Using the following two benchmarks, compare the two algorithms performances and their solution qualities on big random instances, convincing!

import matplotlib.pyplot as plt
from timeit import timeit
from random import randint

# generates some random instance of size N
def random_instance(N):
    W = N**2
    O = {f'o{i}': {'w': randint(1,N**2), 'v': randint(1,N**2)} for i in range(1,N+1)}
    return W, O

def benchmark_time():
    nb_try = 10
    Time_backtracking = []
    Time_greedy = []

    n_list = []


    def ben_bt(N):
        W, O_dict = random_instance(N)
        backtracking(O_dict, W)

    def ben_greedy(N):
        W, O_dict = random_instance(N)
        greedy(O_dict, W)


    for N in range(20, 51, 10):
        n_list.append(N)

        Time_backtracking.append(timeit(lambda: ben_bt(N), number=nb_try))
        Time_greedy.append(timeit(lambda: ben_greedy(N), number=nb_try))


    plt.xlabel('N')
    plt.ylabel('T')
    plt.plot(n_list, Time_backtracking, 'r^', label='back tracking')
    plt.plot(n_list, Time_greedy, 'b^', label='greedy')

    plt.legend()
    plt.show()

def benchmark_value():

    for N in [10,20,30,40]:
        W_r,O_r = random_instance(N)
        res_gr = greedy(O_r, W_r)
        res_bk = backtracking(O_r, W_r)
        print('--- random instance of size {} --- value greedy / value backtracking : {}/{}'.format(N,res_gr['score'], res_bk['score']))

Dynamic programming

We have {$n$} objects {$(o_1, \ldots, o_{n})$} indexed from {$1$} to {$n$}. We note {$V(i, j)$} the maximum total value that can be inserted into the knapsack of capacity j picking up objects among {$(o_1, \ldots, o_i)$}. The recurrent formula is established in the TD.

Implementation

Complete the function Knapsack_DP by:

filling the matrix of values V of size {$(n+1) \times (W+1)$} where the last cell V[n,W] is the value of the optimal solution.
retrieving the solution (the set of selected objects) from the matrix V.

import numpy as np

def Knapsack_DP(O_dict, W):

    n = len(O_dict)
    objs_list = sorted(O_dict.keys())

    # FILLING THE TABLE ITERATIVELY
    V = np.zeros((n+1, W+1), dtype='int32')

    ############### TODO : complete code ####################        

    # RETRIEVE THE SOLUTION
    selected = set()

    ############### TODO : complete code ####################        

    return {'selected': selected, 'score': V[n,W]}

print("--- DYNAMIC PROGRAMMING ---")
bestSol4 = Knapsack_DP(O4, W4)
print('--- simple case : 4 objects --- best found solution {} of value {}'.format(bestSol4['selected'], bestSol4['score']))
assert bestSol4['score'] == 35
assert bestSol4['selected'] == {'o2', 'o3'}

bestSol15 = Knapsack_DP(O15, W15)
print('--- bigger case : 15 objects --- best found solution {} of value {}'.format(bestSol15['selected'], bestSol15['score']))
assert bestSol15['score'] == 1458
assert bestSol15['selected'] == {'o1', 'o3', 'o5', 'o7', 'o8', 'o9', 'o14', 'o15'}

Benchmark [advanced]

If you still have time compare performances of the three resolution algorithms, what do you observe?