Home > Uncategorized > Using #AI in #Cryptanalysis

Using #AI in #Cryptanalysis

pandas_logo

SETUP

First we set some helper variables that define the project

In [8]:

Next, we load the data from the csv file. We also cache the content of the csv file in “pickle” format. This is because csv format files take a few seconds to load but pickle format files load almost instataneously, therfore, to keep the developer sane, we utilize the cache in development.

In [5]:
Out[5]:
input output
0 -2|-72|-11|-2|18|100|-69|15|93|120|15|-97|-35|… 1D8HB58D04F177301
1 -105|-53|20|-126|-87|13|-124|65|58|-116|63|34|… JM1BJ225621628507
2 90|-56|-40|-3|95|0|42|4|-112|48|-37|-10|-7|115… JN1BJ1CP6HW007566
3 116|-85|109|-127|30|-30|23|13|40|127|-97|67|-1… WBA3B1G57ENN90705
4 -121|-22|102|72|-31|-110|-40|36|-117|-119|86|-… 1G1PK5SB4E7391908

DATA PREPARATION

We then transform the data into x for the input features, and y for the output predictions. We could have continued to use pandas as our data structure of choice, however we don’t need the advanced features of pandas, therefore we simply load the data input numpy arrays which are much more lightweight.

In [9]:

We split the data into three portions.

  • Test – used for final testing, completely unseen by the model
  • Validation – used for tuning the model hyperparameters, functions as a “pretend” test set
  • Train – the data used to train the model

The key difference between the test set and validation set is the fact that the models will be tuned for the validation set but not the test set.

In [10]:

We then build and train our models. We use a random forest classifier to predict each character in the output sequence. Through experimentation this was the best performing approach.

A high level description of a random forest classifier is that decision trees are built by construction a tree of decision points based on the input data. The decision points are adjusted in training. Multiple trees are built and their combined result is the prediction.

For more information see here: https://towardsdatascience.com/understanding-random-forest-58381e0602d2

Further models that were experimented with include a simple feedforward network, a recurrent network and an autoencoder neural network. However, this approach was the best performing by a large margin.

In [81]:
finished with model #0
finished with model #1
finished with model #2
finished with model #3
finished with model #4
finished with model #5
finished with model #6
finished with model #7
finished with model #8
finished with model #9
finished with model #10
finished with model #11
finished with model #12
finished with model #13
finished with model #14
finished with model #15
finished with model #16

EVALUATION

Now we evaluate the model to see how it performs and how well it generalizes

In [82]:
In [83]:

now we graph the results to see that the training set attains a near-perfect performance which is to be expected with tree based models. The validation and test sets are consistent with each other, showing that the first 8 digits and the 11th digit are the most accurate. The 9th digit, the security digit, is the least accurate as expected. Everything after the 9th digit (with the exception of digit 11) also has drastically reduced accuracy

In [84]:
In [85]:
In [86]:

USAGE

Now we construct our model for production by training on the entire dataset and serialized to file.

In [87]:
finished with model #0
finished with model #1
finished with model #2
finished with model #3
finished with model #4
finished with model #5
finished with model #6
finished with model #7
finished with model #8
finished with model #9
finished with model #10
finished with model #11
finished with model #12
finished with model #13
finished with model #14
finished with model #15
finished with model #16
In [90]:
finished 0
finished 1
finished 2
finished 3
finished 4
finished 5
finished 6
finished 7
finished 8
finished 9
finished 10
finished 11
finished 12
finished 13
finished 14
finished 15
finished 16

DEPLOYMENT

After serializing the models, they can then be used by our server. We use a flask server that loads the models and responds to inputs to make predictions. The code of which lives in “server.py” and should be fairly self-explanatory.

To run the server, make sure python is installed from

After installing python:

  1. Open the command line program – cmd on windows, terminal on osx
  2. Navigate to the server folder
  3. Run the install file – note this may take some time depending on the computer specifications
    and may look like the program has frozen, please allow it to finish
    a. For osx: run “./install.sh” b. For windows: run “install”
  4. Run the run file
    a. For osx: run “./run.sh”
    b. For windows: run “run”
  5. After the server starts, a process will run to load the model. The console output will denote the progress and the final message “ALL MODELS LOADED” will indicate the server is ready to use.

Now a Flask server will be available at “http://localhost:5000”. This server can be queried by any REST client. For testing, I recommend PostMan or Restlet Client.

The following endpoints are supported:

  • /test – a test endpoint that responds with a test message
  • /info – responds with test probabilities of each character in the output, can be used to estimate confidence
  • /- accepts a parameter “text” with 24 signed integers separated with “|”, returns the predicted 17 character output sequence

For example: http://localhost:5000?text=-2|-72|-11|-2|18|100|-69|15|93|120|15|-97|-35|52|85|-114|53|-123|-1|-101|-38|125|-100|113
Will return: 1D8HB58D04F177301

Advertisements
Categories: Uncategorized
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: