File size: 1,318 Bytes
9c60983
 
8b49747
 
 
 
 
 
 
 
 
ed89188
9c60983
 
8b49747
 
 
 
9c60983
8b49747
9c60983
8b49747
9c60983
8b49747
 
9c60983
34768ec
8b49747
9c60983
34768ec
8b49747
9c60983
8b49747
9c60983
8b49747
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
library_name: transformers
tags:
- cryptology
- cipher
datasets:
- agentlans/high-quality-english-sentences
language:
- en
base_model:
- google-t5/t5-base
license: apache-2.0
---

This project contains a text-to-text model designed to decrypt English text encoded using a substitution cipher.
In a substitution cipher, each letter in the plaintext is replaced by a corresponding, unique letter to form the ciphertext.
The model leverages statistical and linguistic properties of English to make educated guesses about the letter substitutions,
aiming to recover the original plaintext message.

This model is for monoalphabetic English substitution ciphers and it outputs the alphabet used in encoding.

Example:

Encoded text: 
**Hd adcdcwda yod drdqyn zk zsa boiluozzu.**

Plain text:
**We remember the events of our childhood.**

Alphabet (output): 
**rcme...wi.fl.sh.nvu.d.b.to**

Here 'r' is number 1 in the alphabet and that is why we use 'a' instead of 'r' in encoding.

Suggested Usage:
```py
#Load the model and tokenizer
cipher_text = "" #Encoded text here!
inputs = tokenizer(cipher_text, return_tensors="pt", padding=True, truncation=True, max_length=256).to(device)
outputs = model.generate(inputs["input_ids"], max_length=256)
decoded_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
```