SCIpher - A Scholarly Message Encoder

About

SCIpher is a program that can hide text messages within seemingly innocuous scientific conference advertisements. It is based on the context-free grammar used in SCIgen, but instead of randomly piecing together sentences, it uses your input message to control the text it generates. Then, given SCIpher output, it can recover the original message by reverse-engineering the choices made at encoding-time.

One useful purpose for such a program is to communicate secret messages that don't look like secret messages. Encrypted emails, for example, might signal to snoopers that you are an interesting person who bears investigation. However, in our experience when you send out a Call for Papers (CFP) announcement, it's very unlikely that anyone will read it.

In addition, you can use these context-free CFPs to solicit submissions to your very own academic conference. If WMSCI could do it, why not you?

Encode your message



Note: we send your message to our server for encoding, over an unencrypted link. Though we do not log your message, you should not put anything secret in this form. If you like, you can run our code on your local machine, so that the secret never leaves your computer unencoded. That said, encoding is not the same as encryption -- anyone with access to our program or this website could recover your message. For real secrets, you should always first use proper encryption, and then encode with SCIpher.

Decode your message



Note: Our servers will see your decoded message, and send it back to your browser via an unencrypted link. Don't use this site to decode anything you suspect is a real secret!

Background

SCIpher was born from three things:

At a high-level, the encoder works much like SCIgen. A context-free grammar (initially seeded with words and phrases from the original SCIgen grammar, and then filled in with language and structure more specific to CFPs) describes a very large set of possible CFPs. Imagine the grammar as being shaped like a tree (in the CS sense) -- the root branches out into several possible overall structures, each structure branches out into several possible sentences or list items, etc. SCIpher interprets your message as a set of numbers, and at each node in the tree, decides which branch to take based on the next number in the set. When decoding, it parses the CFP to recover the original decisions made during encoding (i.e., which branches were taken), and from there it can recover the set of numbers and interpret it as the original message again.

Some cool features:

Code

Get the code from Github: https://github.com/strib/scipher

Note that we use NLTK to manage and parse the SCIpher grammar. It's cool; use it!

Related Work

Using generic text to hide secret messages is not new. Steganography has been around forever. Even the technique of using a grammar has been around for a while:

SCIpher extends this work by adding the features mentioned above, and also hopefully, in the SCIgen tradition, maximizing amusement.

Contact

scigen-dev at the domain pdos.csail.mit.edu