|
--- |
|
library_name: transformers |
|
tags: |
|
- cryptology |
|
- cipher |
|
datasets: |
|
- agentlans/high-quality-english-sentences |
|
language: |
|
- en |
|
base_model: |
|
- google-t5/t5-base |
|
license: apache-2.0 |
|
--- |
|
|
|
This project contains a text-to-text model designed to decrypt English text encoded using a substitution cipher. |
|
In a substitution cipher, each letter in the plaintext is replaced by a corresponding, unique letter to form the ciphertext. |
|
The model leverages statistical and linguistic properties of English to make educated guesses about the letter substitutions, |
|
aiming to recover the original plaintext message. |
|
|
|
This model is for monoalphabetic English substitution ciphers and it outputs the alphabet used in encoding. |
|
|
|
Example: |
|
|
|
Encoded text: |
|
**Hd adcdcwda yod drdqyn zk zsa boiluozzu.** |
|
|
|
Plain text: |
|
**We remember the events of our childhood.** |
|
|
|
Alphabet (output): |
|
**rcme...wi.fl.sh.nvu.d.b.to** |
|
|
|
Here 'r' is number 1 in the alphabet and that is why we use 'a' instead of 'r' in encoding. |
|
|
|
Suggested Usage: |
|
```py |
|
#Load the model and tokenizer |
|
cipher_text = "" #Encoded text here! |
|
inputs = tokenizer(cipher_text, return_tensors="pt", padding=True, truncation=True, max_length=256).to(device) |
|
outputs = model.generate(inputs["input_ids"], max_length=256) |
|
decoded_text = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
``` |