Fixing Crystal MIME: Attachments + Charset Handling
When you start dealing with real-world email, you quickly learn that the happy-path examples never survive contact with the messy stuff people actually send.
I ran head-first into this while building my MailVault project (a lightweight mail archiver). I needed to parse .eml
files, extract text cleanly, and save attachments reliably. The stock crystal-mime
shard handled basic text bodies, but it fell a bit short on two fronts:
- Attachments weren’t surfaced properly — non-text parts were ignored or lumped into the body.
- Charset handling was missing — anything outside UTF-8/ASCII turned into garbage characters.
So, I forked the library and patched in the features I needed.
What Was Missing
-
Attachments
- Multiparts weren’t recursed deeply enough.
- Binary parts (PDFs, images, etc.) weren’t exposed as real attachments.
- Filenames encoded in headers were ignored.
-
Charset Decoding
- Emails in ISO-8859-1, Windows-1252, and other common charsets broke.
- Headers with encoded words (RFC2047) weren’t decoded consistently.
- RFC2231 “filename*=” values (percent-encoded, charset-aware) were dropped.
This meant a huge chunk of real mail was either unreadable or missing critical context.
What I Added
Here’s what my fork introduces:
- ✅ Recursive multipart parsing (including nested multiparts).
- ✅
attachments
array with filename, content type, and decoded content. - ✅ RFC2231 support (
filename*=
with percent-encoding and charset). - ✅ RFC2047 fallback for legacy encoded headers.
- ✅ Charset detection + decoding:
- ✅ UTF-8 / ASCII → pass through.
- ✅ ISO-8859-1, Windows-1252 → mapped to proper Unicode.
- ✅ Others → left raw with a warning.
- ✅ Safe header lookups (
from
,to
,subject
all optional).
Example
Before (stock crystal-mime
):
email = MIME.mail_object_from_raw(raw)
pp email.body_text #=> "Attachment: <garbled characters>"
pp email.attachments #=> nil
After (my fork):
email = MIME.mail_object_from_raw(raw)
puts email.body_text
#=> "Please see attached document."
pp email.attachments
# [Attachment(
# filename: "report.pdf",
# content_type: "application/pdf",
# size: 23456
# )]
Now attachments show up as first-class citizens, and text in common charsets actually renders correctly.
Why It Matters
This isn’t just cosmetic. For anything like:
- Archiving emails,
- Searching across message bodies,
- Indexing attachments, or
- Building compliance tools,
…you need predictable parsing. Losing filenames or botching encodings makes your data worthless.
The Fork
I’ve pushed this to my fork of crystal-mime
:
👉 GitHub: chrisblunt-codes/crystal-mime
If you need these features today, grab the fork. If you maintain the original shard, maybe consider merging.
Lessons Learned
- Email is ugly. The RFCs are a jungle, and real messages bend the rules constantly.
- Crystal is fast. Once the logic is in place, MIME parsing runs like lightning.
- Solve your pain first. I didn’t set out to “fix Crystal MIME” — I just needed my project to work. That motivation was enough.
This fork is just one piece of a larger puzzle: my MailVault project, where I’m building a robust, searchable mail archive with Crystal, SQLite, and FTS5. More on that soon.
Posted by Chris Blunt — tinkering with Crystal, email, and the messy overlap between them.