All articles
74 articles · updated weekly See our Tools
All articles
Tutorials

What Is Base64? A Plain-English Guide

Base64 encodes binary data into ASCII text so it can travel through text-only protocols. Learn how it works, why files get bigger, and when to use it.

COVER · Tutorials

Base64 shows up everywhere — JWTs, data URLs, email attachments, API payloads — but most developers just use it without understanding the mechanism. That's fine until you need to debug corrupted data or decide whether it makes sense to embed a 200 KB image as a string in a JSON response. Knowing how the algorithm actually works covers those cases in seconds.

This article explains what Base64 is, how the encoding happens byte by byte, why it exists, and when you should not use it. If you want to compare the tools available for encode and decode online, there is a companion post at /en/blog/best-base64-tools with a direct evaluation of each option.

What is Base64

Base64 is an encoding scheme that represents arbitrary binary data using only 64 ASCII characters: uppercase letters (A–Z), lowercase letters (a–z), digits (0–9), plus + and /. The = character is used as padding. The result is a string that any system capable of handling plain text can transmit and store without corruption.

The name comes from exactly that number: 64 distinct symbols. Each symbol represents 6 bits of data (2⁶ = 64). It is important to understand that Base64 is not encryption — there is no key, no secret. It is just a different representation of the same data. Anyone can decode a Base64 string with no additional information.

How it works internally: the algorithm step by step

The algorithm operates on groups of 3 input bytes (24 bits) and converts them into 4 Base64 characters (also 24 bits, but now all within the safe 6-bit ASCII range).

Take the string Man:

  • M = byte 77 = binary 01001101
  • a = byte 97 = binary 01100001
  • n = byte 110 = binary 01101110

Concatenate the 24 bits: 010011010110000101101110

Split into 6-bit groups: 010011 010110 000101 101110

Those decimal values are 19, 22, 5, and 46. Looking up the Base64 table:

  • 19T
  • 22W
  • 5F
  • 46u

Result: TWFu. That is exactly what any Base64 library will give you for Man.

If the input is not a multiple of 3 bytes, the algorithm adds null bytes to complete the group and marks it with = (one = for 2 remaining bytes, two == for 1 remaining byte). The string Hello has 5 bytes — two groups, the second being incomplete — and becomes SGVsbG8=.

If you want to try this without installing anything, the Base64 Encoder on QuickEasy runs entirely in the browser and shows the result in real time.

Why Base64 exists: protocols that cannot handle raw bytes

The problem Base64 solves has its roots in the early communication protocols. SMTP, the email protocol, was designed to transport 7-bit ASCII text. Bytes with the most significant bit set (values above 127) could be altered by relay servers that assumed they were dealing with plain text.

HTTP, while more tolerant, uses headers that require text. JSON has no native type for binary data. HTML has no src attribute that accepts raw bytes. So the standard approach became: convert binary into a text representation that survives any transport channel.

That is Base64's reason for existence. It is not efficiency — it is compatibility. You use it because the target channel has constraints that prevent raw binary.

Real use cases in development

Data URLs in HTML and CSS: embedding images directly in HTML using data:image/png;base64,... eliminates an HTTP request for small icons. It is a valid trade-off for assets of a few KB that appear on every page.

JWT tokens: the header and payload of a JWT are Base64url strings (the URL-safe variant). They are not encrypted by default — anyone with the token can decode and read the payload content. The signature guarantees integrity, not confidentiality. This misconception causes security bugs.

Email attachments (MIME): every PDF, image, or file you send by email goes through Base64 encoding before being transmitted as part of the message body. Your email client decodes it automatically.

REST APIs transporting files: when an endpoint needs to accept a file inside a JSON payload, the alternative to multipart/form-data is encoding the content in Base64 and including it as a string field. It is convenient for small files; for large files, multipart is much better.

Storing binary data in relational databases: some systems store images or certificates in TEXT columns as Base64. It works, but BYTEA (PostgreSQL) or BLOB (MySQL) are more efficient — they store native binary without the encoding overhead.

The 33% overhead: why files get bigger

Base64 converts every 3 bytes into 4 characters. That means the output is always ceil(n / 3) * 4 bytes, where n is the original size. For any input, the result is approximately 33% larger.

A 100 KB PNG becomes a ~133 KB Base64 string. A 1 MB image becomes ~1.33 MB. This overhead is real and has a direct impact on transfer time and memory usage if you are manipulating large strings in JavaScript.

Beyond size, long Base64 strings are opaque to diff tools, logging, and debugging. A 500 KB binary inline in a JSON field will appear as a massive single line in your logs. Consider this before deciding to embed.

Standard Base64 vs URL-safe Base64

The standard variant uses + and / as the 62nd and 63rd characters. Those two characters have special meaning in URLs — + is an encoded space in application/x-www-form-urlencoded, and / separates path segments.

The URL-safe variant (defined in RFC 4648) replaces + with - and / with _, and typically omits the = padding. This allows the resulting string to appear in query strings, path parameters, and cookies without additional percent-encoding.

JWT uses Base64url. If you copy a JWT into a URL without converting to the correct variant, you will have bugs that are hard to track — + becomes a space in some parsers and decoding silently fails.

When not to use Base64

Base64 makes no sense for transferring large files. If a user is uploading a 500 MB video, using Base64 encoding means you will have a ~665 MB string in memory on the server before you start processing. multipart/form-data exists precisely for this — it enables streaming without loading everything into memory.

It also makes no sense when the channel already supports binary. WebSockets have native binary frames. gRPC uses Protocol Buffers with native byte support. HTTP/2 and HTTP/3 are binary protocols — Base64 overhead adds nothing if you control both sides of the communication.

And finally, Base64 is not a security measure. Data encoded in Base64 is not protected — it is just in a different format. If you need confidentiality, use real encryption.

Frequently asked questions

Is Base64 the same as encryption?

No. Base64 is encoding, not encryption. There is no key or secret involved. Anyone with the Base64 string can decode it and get the original data instantly. If you need to protect information, use AES, RSA, or another real cryptographic algorithm.

Why does my JWT have a period in the middle of the Base64 string?

A JWT is not a single Base64 string — it is three Base64url strings separated by periods: header.payload.signature. Each part is independently encoded. The header describes the algorithm, the payload carries the claims, and the signature is calculated over the first two.

Can I use Base64 to store passwords?

Definitely not. Base64 is reversible with no additional information — it is just a different representation. For passwords, use a hash function with salt like bcrypt, scrypt, or Argon2. Those algorithms are intentionally slow and non-reversible.

Why does the Base64 string sometimes end with = or ==?

The padding indicates that the input did not have a multiple of 3 bytes. One = means 1 byte was left in the last group (2 real characters + 2 padding), and == means 2 bytes were left (3 real characters + 1 padding). In the URL-safe variant, padding is typically omitted because the decoder can infer it from the string length.

When the knowledge makes a difference

Understanding Base64 as a mechanism — not as magic — changes how you debug encoding problems. When a JSON payload arrives corrupted, you know to check whether + was accidentally converted to a space. When a JWT fails to validate, you know to check whether the URL-safe variant was used correctly. When someone suggests embedding images in Base64 in every API response, you have the numbers to push back: 33% overhead, everything in memory, no possibility of separate caching. The format is simple; the consequences of using it wrong are not.

RD
Author
Rafael Duarte
Desenvolvedor backend com passagem por fintech e SaaS B2B — trabalhou em times que escalaram APIs de zero a milhões de requisições. Carrega cicatrizes de produção suficientes para ter opiniões fortes sobre ferramentas, padrões e decisões de arquitetura. Não é acadêmico: leu a RFC do UUID quando precisou escolher entre v4 e v7 para uma tabela de alta escrita.
View profile