TD07 - Knapsack
Implementation in Python 3 of different algorithms for the resolution of the Knapsack problem studied in TD 7.
Exercise plan :
- Backtracking
- Greedy
- Dynamic programming
Each code you need to complete is followed by tests to verify that the implemented algorithm is correct. In the following cell, two Knapsack instances are loaded for testing. We use Python dictionaries to represent objects having weights and values.
# a small instance of Knapsack with 4 objects W4 = 5 # weight limit of the knapsack O4 = { 'o1': {'w': 1, 'v': 10}, 'o2': {'w': 2, 'v': 20}, # 'w' for weight and 'v' for value 'o3': {'w': 3, 'v': 15}, 'o4': {'w': 4, 'v': 20} } # a bigger instance of Knapsack with 15 objects W15 = 750 O15 = { 'o1': {'w': 70, 'v': 135}, 'o2': {'w': 73, 'v': 139}, 'o3': {'w': 77, 'v': 149}, 'o4': {'w': 80, 'v': 150}, 'o5': {'w': 82, 'v': 156}, 'o6': {'w': 87, 'v': 163}, 'o7': {'w': 90, 'v': 173}, 'o8': {'w': 94, 'v': 184}, 'o9': {'w': 98, 'v': 192}, 'o10': {'w': 106, 'v': 201}, 'o11': {'w': 110, 'v': 210}, 'o12': {'w': 113, 'v': 214}, 'o13': {'w': 115, 'v': 221}, 'o14': {'w': 118, 'v': 229}, 'o15': {'w': 120, 'v': 240} }
Back tracking
We recall the common code skeleton of a backtracking algorithm (maximizing a score) as presented in the last lecture:
bestScore, bestSol = Inf, None def backtracking(curParSol) : if terminal(curParSol) : if score(curParSol) < bestScore : bestSol, bestScore = curParSol, score(curParSol) else : for childParSol in children(curParSol): backtracking(childParSol)
It explores the solutions space and keeps in memory the best found solution. The previous code is a recursive depth-first search applied to the solutions tree. Here curParSol stands for current partial solution.
In our case (see the tree's figure in TD8), each node of the tree is a partial solution (under construction) of the Knapsack problem.
Partial solution data structure
To start we need to fix an order on objects and index them from {$0$} to {$n-1$} within a Python list that we call objs_list. For example we can consider a lexicographical order where objs_list = ['o1', 'o2', ..., 'on']
Then, we will represent a partial solution as a Python dictionary with 4 entries:
- 'selected' : a Python set of currently selected objects
- 'index' : the index of the next object to select or not to select within the list objs_list
- 'weight' : the total weight of the selected objects
- 'score' : the total value of the selected objects
One partial solution of the knapsack instance (O15 ,W15) is:
exampleParSol = {'selected': {'o1', 'o3', 'o4'} , 'index':6, 'weight': 227, 'score': 434}
and the root of the solutions tree is:
rootSol = {'selected': set() , 'index': 0, 'weight': 0, 'score': 0} %%
Implementation
To complete the following backtracking code, write the children function to compute the list of partial solutions that are children of the current partial solution given as parameter.
def backtracking(O_dict, W) : global bestSol, objs_list # initialization bestSol = {'selected': set() , 'weight': 0, 'score': 0} objs_list = sorted(O_dict.keys()) # lexicographical order on objects # computes the children of the current partial solution def children(curParSol): childrenSols = [] ############### TODO : complete code #################### return childrenSols # a partial solution is terminal when it has no children def terminal(curParSol): return len(children(curParSol)) == 0 def backtracking_rec(curParSol) : global objs_list, bestSol if terminal(curParSol) : if curParSol['score'] > bestSol['score'] : bestSol = curParSol.copy() else : for parSol in children(curParSol): backtracking_rec(parSol) # call backtracking on the root node rootSol = {'selected': set() , 'index': 0, 'weight': 0, 'score': 0} backtracking_rec(rootSol) # return the best found solution (the index entry is no more relevent) bestSol.pop('index') return bestSol print('--- BACKTRACKING ---') bestSol4 = backtracking(O4, W4) print('--- simple case : 4 objects --- best found solution {} of value {}'.format(bestSol4['selected'], bestSol4['score'])) assert bestSol4['score'] == 35 assert bestSol4['selected'] == {'o2', 'o3'} bestSol15 = backtracking(O15, W15) print('--- bigger case : 15 objects --- best found solution {} of value {}'.format(bestSol15['selected'], bestSol15['score'])) assert bestSol15['score'] == 1458 assert bestSol15['selected'] == {'o1', 'o3', 'o5', 'o7', 'o8', 'o9', 'o14', 'o15'}
- A partial solution has at most two children
- A partial solution where the 'index' is equal to {$n$} is terminal and does not have any children
- Do not consider a child solution when it exceeds the weight limit!
- If you copy a partial solution, pay attention to use a deep copy:
dic_orig = {'selected': {'o1', 'o3', 'o4'} , 'index':6, 'weight': 227, 'score': 434} dic_copy = dic_orig.copy() assert id(dic_orig) != id(dic_copy) # the dictionaries are different assert id(dic_orig['selected']) == id(dic_copy['selected']) # but the inner set is the same! dic_copy['selected'].add('o7') assert 'o7' in dic_orig['selected'] # not expected! dic_copy['selected'] = dic_orig['selected'].copy() # copy the inner set too assert id(dic_orig['selected']) != id(dic_copy['selected']) # now inner sets are different! dic_copy['selected'].add('o8') assert 'o8' not in dic_orig['selected'] # as expected! from copy import deepcopy # or use a deepcopy dic_deepcopy = deepcopy(dic_orig) assert id(dic_orig) != id(dic_deepcopy) # the dictionaries are different assert id(dic_orig['selected']) != id(dic_deepcopy['selected']) # the inner sets are different too! dic_deepcopy['selected'].add('o9') assert 'o9' not in dic_orig['selected'] # as expected!
Greedy algorithm
A greedy algorithm constructs a solution step by step without going back on decisions already taken. There is no guaranty of optimality for the solution.
For the Knapsack problem, an efficient greedy algorithm is studied and established in TD7.
Implementation
Complete the following code for greedy algorithm.
def greedy(O_dict, W): # initialization n = len(O_dict) sol = {'selected': set() , 'weight': 0, 'score': 0} ############### TODO : complete code #################### return sol print("--- GREEDY ---") bestSol4 = greedy(O4, W4) print('--- simple case : 4 objects --- best found solution {} of value {}'.format(bestSol4['selected'], bestSol4['score'])) assert bestSol4['score'] == 30 assert bestSol4['selected'] == {'o2', 'o1'} bestSol15 = greedy(O15, W15) print('--- bigger case : 15 objects --- best found solution {} of value {}'.format(bestSol15['selected'], bestSol15['score'])) assert bestSol15['score'] == 1441 assert bestSol15['selected'] == {'o1', 'o2', 'o3', 'o7', 'o8', 'o9', 'o14', 'o15'}
Benchmark
Using the following two benchmarks, compare the two algorithms performances and their solution qualities on big random instances, convincing!
import matplotlib.pyplot as plt from timeit import timeit from random import randint # generates some random instance of size N def random_instance(N): W = N**2 O = {f'o{i}': {'w': randint(1,N**2), 'v': randint(1,N**2)} for i in range(1,N+1)} return W, O def benchmark_time(): nb_try = 10 Time_backtracking = [] Time_greedy = [] n_list = [] def ben_bt(N): W, O_dict = random_instance(N) backtracking(O_dict, W) def ben_greedy(N): W, O_dict = random_instance(N) greedy(O_dict, W) for N in range(20, 51, 10): n_list.append(N) Time_backtracking.append(timeit(lambda: ben_bt(N), number=nb_try)) Time_greedy.append(timeit(lambda: ben_greedy(N), number=nb_try)) plt.xlabel('N') plt.ylabel('T') plt.plot(n_list, Time_backtracking, 'r^', label='back tracking') plt.plot(n_list, Time_greedy, 'b^', label='greedy') plt.legend() plt.show() def benchmark_value(): for N in [10,20,30,40]: W_r,O_r = random_instance(N) res_gr = greedy(O_r, W_r) res_bk = backtracking(O_r, W_r) print('--- random instance of size {} --- value greedy / value backtracking : {}/{}'.format(N,res_gr['score'], res_bk['score']))
Dynamic programming
We have {$n$} objects {$(o_1, \ldots, o_{n})$} indexed from {$1$} to {$n$}. We note {$V(i, j)$} the maximum total value that can be inserted into the knapsack of capacity j picking up objects among {$(o_1, \ldots, o_i)$}. The recurrent formula is established in the TD.
Implementation
Complete the function Knapsack_DP by:
- filling the matrix of values V of size {$(n+1) \times (W+1)$} where the last cell V[n,W] is the value of the optimal solution.
- retrieving the solution (the set of selected objects) from the matrix V.
import numpy as np def Knapsack_DP(O_dict, W): n = len(O_dict) objs_list = sorted(O_dict.keys()) # FILLING THE TABLE ITERATIVELY V = np.zeros((n+1, W+1), dtype='int32') ############### TODO : complete code #################### # RETRIEVE THE SOLUTION selected = set() ############### TODO : complete code #################### return {'selected': selected, 'score': V[n,W]} print("--- DYNAMIC PROGRAMMING ---") bestSol4 = Knapsack_DP(O4, W4) print('--- simple case : 4 objects --- best found solution {} of value {}'.format(bestSol4['selected'], bestSol4['score'])) assert bestSol4['score'] == 35 assert bestSol4['selected'] == {'o2', 'o3'} bestSol15 = Knapsack_DP(O15, W15) print('--- bigger case : 15 objects --- best found solution {} of value {}'.format(bestSol15['selected'], bestSol15['score'])) assert bestSol15['score'] == 1458 assert bestSol15['selected'] == {'o1', 'o3', 'o5', 'o7', 'o8', 'o9', 'o14', 'o15'}
Benchmark [advanced]
If you still have time compare performances of the three resolution algorithms, what do you observe?