CentraleSupélecDépartement informatique
Plateau de Moulon
3 rue Joliot-Curie
F-91192 Gif-sur-Yvette cedex
1CC1000 - Information Systems and Programming - Lab: Security, Cryptography

Table of contents



Password Manager

The objective of this lesson is to address security issues. We will cover the following points:

  • what constitutes a strong password;
  • how to process data in binary form;
  • how to encrypt data.

All of these questions will be addressed by creating a small password management software.

In France, two organizations are particularly involved in data security. On one hand, the CNIL (National Commission for Informatics and Civil Liberties) contributes to the regulatory framework and ensures its compliance. On the other hand, the ANSSI (National Agency for the Security of Information Systems) provides technical assistance to government agencies, businesses, and individuals.

These organizations are excellent sources for finding best practices to combat data breaches. This tutorial relies on their recommendations.

Using strong passwords

Choosing a strong password is crucial for the protection of your data. Additionally, it is often recommended to use different passwords for the various online services we use. This helps limit the risks in case of a data breach.

An important point of caution is the password for your email account, as it often allows you to reset the other passwords.

Password strength

The strength or robustness of a password refers to its ability to withstand an enumeration of all possible passwords. It depends on the length {$L$} of the password and the size {$N$} of the alphabet. In the case where the password is chosen randomly, this measure corresponds to entropy and can be calculated using the formula {$F=\log_2(N^L)$}.

The strength of a password can be estimated as follows:

  • very weak if {$F < 64$};
  • weak if {$F < 80$};
  • medium if {$F < 100$};
  • strong if {$F \geq 100$}.

Indeed, by using a Python function, we can determine the required password length based on the size of the alphabet. For a password consisting only of digits, it would require 25 symbols to achieve a medium strength password. However, for a password that includes digits, lowercase letters, uppercase letters, and punctuation, only 13 symbols would be needed.

from math import pow, log2

def force(L, N):
    return int(log2(pow(N, L)))

Test the password strength

Most websites require a password to include a lowercase letter, an uppercase letter, a digit, and a punctuation symbol. Therefore, at least 13 symbols are needed for a randomly chosen password to be of medium strength.

Write a function is_strong_enough(password) that takes a password as a parameter and returns a boolean indicating whether the password meets the constraints.

Here are the punctuation symbols that we will consider: punct = ".:;!?/()&#@&_-*%"


You can use the methods isxxx() described on this page to check for the category of a character.


ANSWER ELEMENTS

punct = ".:!?/()&#@&_-*%"

def is_strong_enough(password):
    return any(c.islower() for c in password) \
       and any(c.isupper() for c in password) \
       and any(c in punct  for c in password) \
       and any(c.isdigit() for c in password)

Password generation

To generate passwords, we would like to be able to randomly select symbols from a set of symbols. This is achieved by using function choice from the module random (see here). This function takes a sequence as a parameter and returns an element from that sequence.

To implement this function, Python needs to generate random values. The problem is that our computers are deterministic, which makes them ill-suited for generating sequences of random values. In practice, we do not generate truly random sequences but pseudo-random ones, meaning they look random without being truly random. The module random provides fast pseudo-random functions that are suitable for simulation purposes but not for cryptography.

For our purpose, we will use the function choice from the module secrets (documentation). It works in the same way as its counterpart in the module random but is less predictable.

Write a function generate_password(size=13) that returns a randomly generated password that meets the size and strength criteria.

  • The simplest solution that preserves the most entropy is to randomly generate a password until it meets the strength constraint.

The module string includes the following sequences that facilitate the implementation: digits and ascii_letters.

Note

A password generated in this way is robust, but it can be difficult to remember. At the end of this topic, we will introduce another method for generating strong passwords that are easier to memorize: the Diceware method.

If you have enough time, you will have the opportunity to discover it. Otherwise, we encourage you to explore it later on.


ANSWER ELEMENTS

def generate_password(size=13):
    alphabet = string.digits + string.ascii_letters + symbols
    password = ""
    while not is_strong_enough(password):
        password = "".join([secrets.choice(alphabet) for _ in range(size)])
    return password

Character encoding

Before diving into encryption, we must first discuss character encoding.

A file on a hard disk is a sequence of bits. This sequence has no inherent meaning; it all depends on how we interpret it. When we talk about a "text" file, we choose to interpret the bytes of a file as letters. However, there is no single way to do this. Character encoding is the table that associates one or more specific bytes with each character.

Due to historical reasons related to the management of alphabets in different languages, there are multiple encodings. Among the most commonly encountered in Europe are latin-1 and utf-8.

We can convert a string of characters into bytes using the function encode, which takes the desired encoding as an argument. In the absence of an argument, the default encoding used is utf-8, which is the current standard. For example, if we enter the instruction 'E-e-é'.encode() in an interpreter, we get the following result: b'E-e-\xc3\xa9'.

What we observe is not a character string but a series of bytes. We know this because of the letter b to the left of the first apostrophe. Between the apostrophes, the interpreter represents the bytes in condensed form that resembles a character string but is not one.

The interpretation of this "string" is as follows. Letters that are not preceded by a \ represent the corresponding byte in the utf-8 encoding. For example, the letter a is encoded in utf-8 as the decimal value 97 (or, 61 in hexadecimal), so b"a" equals 01100001 in binary.

The two characters following \x represent an octet in hexadecimal. The values b"\x00" and b"\xff" correspond to the two bytes 00000000 and 11111111. Note that b"\x61" is equal to 01100001, just like b"a".

Conversely, we can use decode to convert a sequence of bytes into a character string. If not specified, the default encoding used is utf-8.

Open a Python shell and execute the following instructions. You will see that depending on the encoding, letters are not always represented in the same way. Sometimes we use more or fewer bytes to represent them.

[bin(byte) for byte in "E-e-é".encode("latin-1")]
[bin(byte) for byte in "E-e-é".encode("utf-8")]

Keychain

To introduce encryption, we will create a rudimentary password manager.

For the following exercises, you need the Python package cryptography, that you can install using pip.

python (or, py or python3) -m pip install cryptography

It is executed via the command line with the following commands:

  • python pykey.py get name to return the password associated to the website name ;
  • python pykey.py set name to generate (and/or replace) a password associated to the website name.

If you type any other command, an help message will appear informing you on how to use the program.

Download the following Python file pykey.py. We will gradually complete it.

Comments on the code

The functions is_strong_enough, generate_password, get_password and set_password are currently empty; we will complete them later.

We will analyze the functionality of the functions load_passwords and save_passwords, but for now, their names are sufficiently self-explanatory. They obviously allow loading and saving a password database. This database is simply a dictionary that associates a string (the name of a website) with another string (a password). For example:

{
    "google.fr" : "?NFiuhe875",
    "centralesupelec.fr" : "746405nIUbdeQCp!"
}

The function main contains all the calls. It starts by checking the list of parameters passed when the program is called (grouped by the interpreter in the sys.argv list). If the number or nature of the parameters is incorrect, it displays a help message and exits the program (print_help). If everything is correct, then the following three instructions will be executed:

db = load_passwords(key)
action(db, sys.argv[2])
save_passwords(db, key)

These instructions allow:

  • loading the database contained in the keychain.dat file;
  • performing an action, which is either get_password or set_password (note the use of a variable to store a function and call it later);
  • saving the password database.

For now, everything is stored in plain text, and the key variable is not yet significant.

Generate a new password

Complete the functions generate_password(size=13) and is_strong_enough(password) with the code from the previous exercise.


Implement the function set_password(db, name) that takes the password dictionary db and the string name as parameters. This function generates a new random password. It associates the password with name in the db dictionary and displays the password on the screen. If an entry already exists in the dictionary for the name value, it will be replaced.

You should be able to verify that your function works correctly by examining the contents of the file keychain.dat. For the example dictionary shown above, your file should contain the following text. The § symbol delimits the websites and passwords.

google.fr§?NFiuhe875
centralesupelec.fr§746405nIUbdeQCp!

ANSWER ELEMENTS

def set_password(db, name):
    p = generate_password()
    db[name] = p
    print(f'New password for "{name}" is set: {p}')

Get a password

Complete the function get_password(db, name) that will display the password associated with name in the db dictionary.

If this password does not exist, you can display the list of entries present in the dictionary (the available websites without the passwords).


ANSWER ELEMENTS

def get_password(db, name):
    if name in db:
        print(f'Password for "{name}" : {db[name]}')
    else:
        print(f'No password found for "{name}". Available entries are:')
        for n in db:
            print(f"- {n}")

Encryption/Decryption

We now have a rudimentary but functional software. The problem is that our passwords are stored in plain text, which is insecure. Therefore, we will use a symmetric encryption algorithm to protect our data. The encryption key will be the same as the decryption key.

Let's proceed step by step. We will start with a hard-coded key generated beforehand and then see how to generate it from a password.

Let's take a look at the code for the save function. The principle is as follows:

  1. transform each dictionary entry into a line containing the key and the value separated by the delimiter @§@;
  2. encode all the lines into binary form;
  3. open the save file in binary write mode;
  4. write the data.
def save_passwords(db, key):
    data = "\n".join(w + delimiter + p for w, p in db.items())   #1.
    data = data.encode()                                         #2.

    with open(keychain_path, 'wb') as f:                         #3.
        f.write(data)                                            #4.

To encrypt a sequence of bytes with Python, we can use the module Fernet from the library cryptography (see here), which implements the AES algorithm with 128-bit keys.

For encryption, the following two instructions are sufficient. Before their execution, the variable data contains plaintext in the form of bytes. After their execution, the variable data will contain encrypted data.

fernet = Fernet(key)
data = fernet.encrypt(data)

Decryption

Modify the function save_passwords(db, key) by introducing the encryption.

If everything is done correctly, after running your program, you should no longer be able to understand what is stored in the file keychain.dat.

Please note that while the modification to load_passwords(key) is not done, the program will generate an error when attempting to read a password.


ANSWER ELEMENTS

def save_passwords(db, key):
    data = "\n".join(w + delimiter + p for w, p in db.items())
    data = data.encode()

    fernet = Fernet(key)
    encrypted = fernet.encrypt(data)

    with open(keychain_path, 'wb') as f:
        f.write(encrypted)

Decryption

The function load_passwords(key) is slightly more complex than the previous one, but the modification required is not any harder. Here is how it works:

  1. If the keychain.dat file does not exist, create one by saving an empty dictionary.
  2. Open the file in binary read mode.
  3. Read all the binary data.
  4. Convert the read bytes into a string.
  5. Initialize an empty dictionary.
  6. Instruct Python to transform the string into a "stream of lines" that can be iterated.
  7. Clean and split the lines using the delimiter.
  8. Use each piece of the lines to populate the dictionary.
def load_passwords(key):
    if not os.path.isfile(keychain_path):       #1.             
        save_keychain({}, key)

    with open(keychain_path, 'rb') as f:        #2.
        data = f.read()                         #3.
    data = data.decode()                        #4. 

    db = {}                                     #5.
    for l in io.StringIO(data):                 #6.
        s = l.strip().split(delimiter)          #7.
        db[s[0]] = s[1]                         #8.
    return db

The two instructions required to modify this code and decrypt an encrypted file are as follows:

fernet = Fernet(key)
data = fernet.decrypt(data)

Modify the function load_passwords(key). If everything is done correctly, your program should work again.


ANSWER ELEMENTS

def load_passwords(key):
    if not os.path.isfile(keychain_path):
        save_passwords({}, key)

    with open(keychain_path, 'rb') as f:
        data = f.read()

    fernet = Fernet(key)
    data = fernet.decrypt(data)
    data = data.decode()

    db = {}
    for l in io.StringIO(data):
        s = l.strip().split(delimiter)
        db[s[0]] = s[1]
    return db

Generate a key

Our program is currently inefficient because the encryption key is displayed in plain text in the source code... The following function helps us by generating a key from a password.

def generate_key(password):
    password = password.encode()
    kdf = PBKDF2HMAC(algorithm=hashes.SHA256(),
                     length=32,
                     salt=salt,
                     iterations=100000,
                     backend=default_backend())
    return base64.urlsafe_b64encode(kdf.derive(password))

These lines are found as is in the module's documentation. Without going into too much detail, PBKDF2HMAC generates a key. To do this, the function concatenates the password with a data called the 'salt.' It is a configuration data that can remain public. The salted password is then hashed iteratively 100,000 times.

Open a Python shell in a terminal and generate a salt with the following instruction.

import os
os.urandom(16)

Two keys generated from the same password but with different salts will not be identical. Therefore, you should not change the salt once the data is encrypted, as you may no longer be able to decrypt anything...

Salt and key generation

Add the generate_key(password) function to your program and change the value of the variable salt with the value that you generated in the Python shell.


ANSWER ELEMENTS

def generate_key(password):
    password = password.encode()
    kdf = PBKDF2HMAC(algorithm=hashes.SHA256(),
                     length=32,
                     salt=salt,
                     iterations=100000,
                     backend=default_backend())
    return base64.urlsafe_b64encode(kdf.derive(password))

Enter the master password

To use the function generate_key(password), we need to ask the user to enter a master password.

It would be possible to use the input function from the Python standard library, but the password would be visible. Instead, we prefer to use the getpass function from the eponym module.

from getpass import getpass
password = getpass("Enter your master password:")

Replace the key = b'WNlS4K1hLhAVl8JiYV0Fj8e92EiSEQi5VS4KNGNPQCc=' instruction in the main function with instructions that allow the user to enter a password from the keyboard and generate a key.

The keychain.dat password file was generated with a different key, so you should get the following error message when running: cryptography.fernet.InvalidToken.

For now, we simply recommend deleting the keychain.dat file; it will be recreated with the new key.


ANSWER ELEMENTS

def main():
    if __name__ == "__main__":
        print("Pykey - Password manager")

        if len(sys.argv) <= 1:
            print_help()
        elif sys.argv[1] == "get" and len(sys.argv) == 3:
            action = get_password
        elif sys.argv[1] == "set" and len(sys.argv) == 3:
            action = set_password
        else:
            print_help()

        password = getpass("Enter your master password.")
        key = generate_key(password)

        db = load_passwords(key)
        action(db, sys.argv[2])
        save_passwords(db, key)


Conclusion

At this stage, we have created the outline of a password manager. There are many software options available today that provide such services. We recommend using them instead of a homemade solution. They have the advantage of being developed by experts in computer security and kept up to date.

Regarding our software, here is a non-exhaustive list of potential improvements that could be considered and serve as optional questions:

  • How to handle the input of an incorrect password that crashes the program?
  • How to enforce double-entry of the password when creating the password database?
  • How to directly copy the password to the clipboard to prevent it from being displayed on the screen?
  • How to generate a master password that is easy to remember?

Improvements (optional)

Exception Handling

When an incorrect password is entered, the key generated for decryption does not match the one used for encryption. In this case, the fernet.decrypt function raises an exception, and the program stops.

It is possible to catch this exception using a try - except block. To do this, we need to import the InvalidToken exception from cryptography.fernet and encapsulate the function that may raise it.

Include the following instructions to your program

try:
    db = load_passwords(key)
except InvalidToken :
    print("Wrong master password.")
    sys.exit()

ANSWER ELEMENTS

from cryptography.fernet import InvalidToken


def main():
    if __name__ == "__main__":
        print("Pykey - Password manager")

        if len(sys.argv) <= 1:
            print_help()
        elif sys.argv[1] == "get" and len(sys.argv) == 3:
            action = get_password
        elif sys.argv[1] == "set" and len(sys.argv) == 3:
            action = set_password
        else:
            print_help()

        password = getpass("Enter your master password.")
        key = generate_key(password)

        try:
            db = load_passwords(key)
        except InvalidToken :
            print("Wrong master password.")
            sys.exit()

        action(db, sys.argv[2])
        save_passwords(db, key)


Copy to Clipboard

To prevent a clear-text password from being displayed on the screen, you can use the pyperclip module to copy a string directly to the clipboard.

The module pyperclip should already be installed if you had followed our installation instructions at the beginning of the course.

Otherwise, you can install it by typing the following command :

  • On Windows: python -m pip install pyperclip, OR py -m pip install pyperclip
  • On macOS: python3 -m pip install pyperclip

Modify the get_password function to copy the password to the clipboard. Take inspiration from the following instruction: pyperclip.copy("pass").

Implement this functionality.


ANSWER ELEMENTS

import pyperclip

def get_password(db, name):
    if name in db:
        print(f'Password for "{name}" has been copied to the clipboard.')
        pyperclip.copy(db[name])
    else:
        print(f'No password found for "{name}". Available entries are:')
        for n in db:
            print(f"- {n}")

Typing your password twice

The load_passwords(key) function allows creating a new database if the keychain.dat file does not exist. These are the first two instructions that accomplish this.

def load_passwords(key):
    if not os.path.isfile(keychain_path):
        save_passwords({}, key)

    with open(keychain_path, 'rb') as f:
        data = f.read()
    data = data.decode()

    db = {}
    for l in io.StringIO(data):
        s = l.strip().split(delimiter)
        db[s[0]] = s[1]
    return db

At the time of calling this function, the user has already entered the master password, and the associated key is passed as a parameter to the load_passwords(key) function.

In the case where the keychain.dat file does not exist, we can modify this function to prompt the user to re-enter their master password. This allows us to generate a second key. If this key is identical to the key passed as a parameter, then the user has entered the same password twice, and we can recreate the keychain.dat file. If not, we can inform the user that they made a mistake and exit the program (using sys.exit()).

Implement this functionality.


ANSWER ELEMENTS

def load_passwords(key):
    if not os.path.isfile(keychain_path):
        print("No keychain file found.")
        password = getpass("Enter your master password again to recreate a keychain file.")
        key2 = generate_key(password)
        if key.hex() == key2.hex():
            print("Empty keychain file generated.")
            save_keychain({}, key)
        else:
            print("You entered two different passwords. No keychain file generated.")
            sys.exit()

    with open(keychain_path, 'rb') as f:
        data = f.read()

    fernet = Fernet(key)
    data = fernet.decrypt(data)
    data = data.decode()

    db = {}
    for l in io.StringIO(data):
        s = l.strip().split(delimiter)
        db[s[0]] = s[1]
    return db


Diceware (optional)

The method we saw earlier for generating passwords is robust, but it has two drawbacks:

  • it requires trusting the computer for random generation;
  • it produces passwords that are difficult to remember...

We might consider creating passwords using other means. For example, drawing inspiration from common words and introducing substitutions to meet the criteria for password strength. Unfortunately, these passwords have low entropy and should be avoided.

XKCD illustrates this well and suggests a well-known method for password generation: the Diceware method.

This method relies on a list of approximately 8000 words. A password is a random selection of several words from this list, also known as a passphrase.

Choosing a word randomly from a list of 8000 words corresponds to approximately 13 bits of entropy. To obtain a password of average strength, you should choose at least 6 words.

This method can be performed without computer assistance. Each word in the list is identified by a five-digit value ranging from 1 to 6. In other words, each word can be selected by rolling 5 dice.

There are alternative word lists available today. In particular, the Electronic Frontier Foundation has developed lists based on pop culture (Star wars, Harry Potter, Game of Thrones, Star Trek). The principle remains the same, but here, a word is selected by rolling 3 dice with 20 sides.

Generate a passphrase

The following function reads a cleaned Diceword file (with header lines and line identification removed) using the utf-8 encoding. The file is available here: : starwars_8k_2018.txt.

def read_diceware():
    words = []
    with open("starwars_8k_2018.txt", "r") as f:
        for l in f:
            words.append(l.strip())
    return words

You can now create a generate_passphrase function that randomly generates a passphrase. For improved readability, you can capitalize the first letter of each word using the function capitalize.