/code/ - Coding/Scripting

- [WT] [Home] [Manage]

Posting mode: Reply

File 131767621499.png - (288B , 32x32 , shorten_css.png ) Thumbnail displayed, click image for full size.

WhoreosNMilf 11/10/03(Mon)14:10 No. 235

This is a D/IRT-encoded ansi text/css file.
D/IRT is something I came up with for encoding binary files as losslessly compressed image data.

The basic idea is based on the observation that tgz archives tend to be slightly more efficient than a standard zip archive despite using the exact same compression method. This struck me as odd because zip compresses each file individually, thus making it capable of choosing the best huffman table for each individual file, whereas tar uses a store method that separates each file into a chunk rounded-up to the nearest byte-cluster. This means a file which is only 512bytes will be stored in a 1024-byte (1kb) chunk within the tar file. This actually increases the amount of data which must be compressed, however it only adds nulls, which create redundancy that is easily removed by simple RLE. Apparently this additional redundant data improves compressibility slightly; if you create a wim (store) archive of the same data and compress that, the output is larger than the tarball.

This put me to thinking, 'what if you combined the principal of null bloat with the prediction filters used by the png pre-processing step?' If you could insert null padding anywhere into a data string in order to increase the amount of pattern occurrences, would that not further improve the ability to compress the data?
Using a rudimentary understanding of image redundancy I was able to compress this file to within 10 bytes of the gzip equivalent. A computer analyzing the data with the png compression library would be much better at arranging the data for optimal redundancy. Mind you, the 10 bytes measurement involves stripping the png file down to just the headerless IDAT chunk; however recreating that is simple enough as the file is encoded as an 1-bit pallet-less grayscale image, and the dimensions are logarithmic squares (1,2,4,8,16,32,64...etc).

This particular example is encoded as raw text, rather than as an actual bit-for-bit text file. Depending on the number of possible values necessary to encode a proper binary file, we can also expand to 2-bit grayscale+alpha, or 3-bit RGB etc.

I apologize for any improperly-used terminology. I'm pretty much retarded when it comes to actual code.

Expand all images

WhoreosNMilf 11/10/03(Mon)14:22 No. 236
File 131767696563.png - (1.58KB , 230x166 , ansi key.png ) Thumbnail displayed, click image for full size.

here's the decoder key for basic ansi files. it's incomplete, as it only covers standard keyboard characters, not extended ones like umlauts and such. /s is space by the way. /r would automatically be converted to /n in a perfect world, but windows notepad doesn't read that properly. WordPad does though, so I wasn't too worried about it at the time.

Anonymous 11/10/07(Fri)06:06 No. 237

>>236
Have you ever looked into yEnc? It came about during the Usenet era. It uses a scheme to replace binary code with ANSI characters, creating a nearly 1-to-1 encoding. Certain codes have more than one ANSI character applied. Reportedly, the scheme led to a bloat of only 1-2%. It eventually lost out to Base64 which has enormous bloat, but the code was put into public domain some time back and can probably be found on a search engine request.

>>	Anonymous 11/10/19(Wed)12:12 No. 250 I have no idea what this means but it sounds cool.

WhoreosNMilf 11/10/20(Thu)21:52 No. 251
File 131917275336.png - (98.39KB , 4500x5000 , exe in _7z in .png ) Thumbnail displayed, click image for full size.

>>250
Essentially, if it could be fully implemented with a user-friendly conversion/deconversion program, one would be able to share raid tools on sites like 4chan which filter out regular image 'archives' where an archive is appended to the end of an image file.

This image for example, contains a 7zip archive with minecraft exe launcher. 4chan would block it.

>>	Anonymous 11/10/22(Sat)15:15 No. 252 This is called steganography.

WhoreosNMilf 11/10/24(Mon)12:05 No. 253

>>252
It's fairly similar, but the intention isn't so much to obscure the fact that data is being transferred, as to create a method for transferring the data which overcomes the limitations of imageboards like 4chan. With yEnc or Base64, you have to post raw text which is subject to post length as well as the fact that it resembles garbage spam and could be potentially tampered with by wordfilters. Images are only subject to a size limit, which far exceeds the amount of text bytes you can send. Since png is a lossless compression method, there are utilities which can be used to optimize the compression to achieve more data representation without loss of integrity.

Anonymous 12/06/21(Thu)12:00 No. 398

Is the data preserved under common image transformations (if the transformations are known)?

At least scaling will break it down. So the images should be kept small.

Scaling with interpolation (matrices ahead):

2222 44
6666 -> 44
2222
6666

fucks it up.

so at this board you would only be able to store : 200 rows x 200 cols x 4 color channels = 160000 symbols without breaking down. With your current 99 characters (add one for saying stream is done), this is 2.5 character per symbol. You could add 40000 characters.

Which is not bad at all.

>>	Anonymous 12/06/21(Thu)22:18 No. 399 >>398 But the scaling only happens with thumbnails, correct? So while the thumbnail would be corrupted, but original image that is linked to by the thumbnail would still be preserved. Unless some post-upload processing is being done...

>>	Anonymous 12/06/24(Sun)05:11 No. 400 >>399 Yes, I haven't thought of that. Post processing wouldn't be a problem, if you have some ideas, what the used transforms are. Many have inverses.

>>	Anonymous 12/06/27(Wed)18:57 No. 401 impressive OP

WhoreosNMilf 12/07/09(Mon)13:30 No. 402
File 134186581068.png - (7.62KB , 256x256 , page.png ) Thumbnail displayed, click image for full size.

>>398
Thumbnails would be a moot point anyway because 4chan uses jpeg compression for all thumbnails, thus compromising the integrity of the data even if the file dimensions are within the limits of a thumbnail.

>>399
Post processing of images is generally not done on chans due to the excessive burden which is placed on the server CPU. Any post processing which might be done to save space would most likely consist of stripping metadata. However, since the data has literally been transformed into a visual representation, the data is safe from tampering.

In the most basic form, transfer mode, it might work something like this:
*Grayscale bitmap is created (all black/00).
*Must be a square of 2 with total pixel area exceeding size of file to be encoded.
>2x2=4
>4x4=16
>8x8=64
>16x16=256
>32x32=1024
>64x64=4096
>128x128=16384
>256x256=65536
*Bitmap is opened in hex editor.
*Hex data is pasted at the first pixel. (This can be identified by placing a white pixel at the top left and bottom left of the bitmap before saving. The first pixel will then be hex FF.)
*Locate the 'last' pixel. (The second FF near end of file.)
This is where the rebuild parameters are stored.
The rebuild parameters are a 17 byte string containing instructions for recreating the original bitmap file so the hex data can be extracted. (It could be shorter than 17 if we came up with a logical system rather than a human-identifiable one.)
The FF byte is changed to F0.
F0(format) is followed by 1 byte to represent the original file format. There are 2 formats which can be used to store the data:
>b3 (uncompressed bitmap)
>1f (uncompressed tiff)
Bitmap is preferred because the file structure is arranged so that when the data is converted to png, the rebuild parameters will be the first 17 pixels at the top left corner.
Next...
BD (bit depth) is followed by a 2 byte code
Byte 1 (color depth):
>01 (grayscale)
>02 (grayscale+alpha)
could be used to provide more storage without creating excessively large bitmaps
>03 (RGB)
>04 (RGB+alpha) hypothetically
Byte 2 (bits per pixel)
>08 (8bpp- standard)
>16 (16bpp- not supported by most image programs)
Could try splitting bytes between the gray and alpha channels, not sure if that would create more compressible redundancy or just complicate things more by doubling the color depth.
After this we have,
DA (data type)
>AC (ascii: text data)
>B1 (binary)
And finally we have the raw data locations. Unfortunately, these offsets assume that all programs will create a (grayscale) bitmap with the same amount of header garbage. I haven't checked into this yet.
DB (data begin) is followed by 4 bytes which refer to the offset for the start of the data portion.
DE (data end) is followed by 4 bytes designating the offset for the end of the data.

At first I thought I'd just write this in ascii at the EoF, but then I realized that would leave variable-length strings to be identified, which makes things more complicated and diminishes the number of like pixels.

The attached image is this web page encoded using this method.

Delete post [File Only]
Password

Report post
Reason

Name
Email
Subject	(reply to 235)
Message
Captcha
File
Embed	Help
Password	(for post and file deletion)

Supported file types are: 7Z, GIF, JPG, PDF, PNG, RAR, TXT, XZ, ZIP Maximum file size allowed is 10240 KB. Images greater than 200x200 pixels will be thumbnailed. Currently 112 unique user posts. View catalog Blotter updated: 2012-05-14 Show/Hide Show All 05/14/12 - Users, we are currently pruning boards that are rarely used, and merging redundant boards, For your pleasure. 03/22/12 - FYI /cwc/ users: tinyurl is banned because of stuff like linkbux and nimp clones. 12/01/11 - Check out our irc channel irc.lostsig.net/6667 #789