Researchers hide information in plain text

12 May 2018

Computer scientists at Columbia University school of Engineering and Applied science (Columbia Engineering) have invented FontCode, a new way to embed hidden information in ordinary text by imperceptibly changing, or perturbing, the shapes of fonts in text.

FontCode creates font perturbations, using them to encode a message that can later be decoded to recover the message. The method works with most fonts and, unlike other text and document methods that hide embedded information, works with most document types, even maintaining the hidden information when the document is printed on paper or converted to another file type.

The paper will be presented at the the annual conference on computer graphics, SIGGRAPH, in Vancouver, British Columbia, from 12 to 16 August .

"While there are obvious applications for espionage, we think FontCode has even more practical uses for companies wanting to prevent document tampering or protect copyrights, and for retailers and artists wanting to embed QR codes and other metadata without altering the look or layout of a document," says Changxi Zheng, associate professor of computer science and the paper's senior author.

Zheng created FontCode with his students Chang Xiao (PhD student) and Cheng Zhang MS'17 (now a PhD student at University of California at Irvine) as a text steganographic method that can embed text, metadata, a URL, or a digital signature into a text document or image, whether it's digitally stored or printed on paper.

It works with common font families, such as Times Roman, Helvetica, and Calibri, and is compatible with most word processing programs, including Word and FrameMaker, as well as image-editing and drawing programs, such as Photoshop and Illustrator. Since each letter can be perturbed, the amount of information conveyed secretly is limited only by the length of the regular text. Information is encoded using minute font perturbations — changing the stroke width, adjusting the height of ascenders and descenders, or tightening or loosening the curves in serifs and the bowls of letters like o, p, and b.

"Changing any letter, punctuation mark, or symbol into a slightly different form allows you to change the meaning of the document," says Xiao, the paper's lead author. "This hidden information, though not visible to humans, is machine-readable just as barcodes and QR codes are instantly readable by computers. However, unlike barcodes and QR codes, FontCode doesn't mar the visual aesthetics of the printed material, and its presence can remain secret."

Data hidden using FontCode can be extremely difficult to detect. Even if an attacker detects font changes between two texts — highly unlikely given the subtlety of the perturbations — it simply isn't practical to scan every file going and coming within a company.

Furthermore, FontCode not only embeds but can also encrypt messages. While the perturbations are stored in a numbered location in a codebook, their locations are not fixed. People wanting to communicate through encrypted documents would agree on a private key that specifies the particular locations, or order, of perturbations in the codebook.

"Encryption is just a backup level of protection in case an attacker can detect the use of font changes to convey secret information," says Zheng. "It's very difficult to see the changes, so they are really hard to detect — this makes FontCode a very powerful technique to get data past existing defenses."

FontCode is not the first technology to hide a message in text — programs exist to hide messages in PDF and Word files or to resize whitespace to denote a 0 or 1 — but, the researchers say, it is the first to be document-independent and to retain the secret information even when a document or an image with text (PNG, JPG) is printed or converted to another file type. This means a FrameMaker or Word file can be converted to PDF, or a JPEG can be converted to PNG, all without losing the secret information.

To use FontCode, you would supply a secret message and a carrier text document. FontCode converts the secret message to a bit string (ASCII or Unicode) and then into a sequence of integers. Each integer is assigned to a five-letter block in the regular text where the numbered codebook locations of each letter sum to the integer.

Recovering hidden messages is the reverse process. From a digital file or from a photograph taken with a smartphone, FontCode matches each perturbed letter to the original perturbation in the codebook to reconstruct the original message.

Matching is done using convolutional neural networks (CNNs). Recognising vector-drawn fonts (such as those stored as PDFs or created with programs like Illustrator) is straightforward since shape and path definitions are computer-readable. However, it's a different story for PNG, IMG, and other rasterized (or pixel) fonts, where lighting changes, differing camera perspectives, or noise or blurriness may mask a part of the letter and prevent an easy recognition.

While CNNs are trained to take into account such distortions, recognition errors will still occur, and a key challenge for the researchers was ensuring a message could always be recovered in the face of such errors. Redundancy is one obvious way to recover lost information, but it doesn't work well with text since redundant letters and symbols are easy to spot.

Instead, the researchers turned to the 1,700-year-old Chinese Remainder Theorem, which identifies an unknown number from its remainder after it has been divided by several different divisors. The theorem has been used to reconstruct missing information in other domains; in FontCode, researchers use it to recover the original message even when not all letters are correctly recognised.

"Imagine having three unknown variables," says Zheng. "With three linear equations, you should be able to solve for all three. If you increase the number of equations from three to five, you can solve the three unknowns as long as you know any three out of the five equations."

Using the Chinese Remainder theory, the researchers demonstrated they could recover messages even when 25 per cent of the letter perturbations were not recognised. Theoretically the error rate could go higher than 25 per cent.

The authors, who have filed a patent with Columbia Technology Ventures, plan to extend FontCode to other languages and character sets, including Chinese.

"We are excited about the broad array of applications for FontCode," says Zheng, "from document management software, to invisible QR codes, to protection of legal documents. FontCode could be a game changer."

Server CPU Shortages Grip China as AI Boom Strains Intel and AMD Supply Chains

By Cygnus | 06 Feb 2026

Intel and AMD server CPU shortages are hitting China as AI data center demand surges, pushing lead times to six months and driving prices higher.

Budget 2026-27 Seeks Fiscal Balance Amid Rupee Volatility and Industrial Stagnation

By Cygnus | 02 Feb 2026

India's Budget 2026-27 targets fiscal discipline with record capex as markets tumble, the rupee weakens and manufacturing struggles to regain momentum.

The Thirsty Cloud: Why 2026 Is the Year AI Bottlenecks Shift From Chips to Water

By Axel Miller | 28 Jan 2026

As AI server density surges in 2026, data centers face a new bottleneck deeper than chips — the massive water demand required for cooling next-generation infrastructure.

The New Airspace Economy: How Geopolitics Is Rewriting Aviation Costs in 2026

By Axel Miller | 22 Jan 2026

Airspace bans, sanctions and corridor risk are forcing airlines into costly detours in 2026, raising fuel burn, reducing aircraft utilisation and pushing airfares higher worldwide.

India’s Data Center Arms Race: The Battle for Power, Cooling, and AI Real Estate

By Cygnus | 22 Jan 2026

India’s data centre boom is turning into an AI arms race where power contracts, liquid cooling and fast commissioning decide the winners across Mumbai, Chennai and Hyderabad.

India’s Oil Balancing Act: Refiners Rebuild Middle East Supply Lines as Russia Flows Disrupt

By Axel Miller | 21 Jan 2026

India’s refiners are rebalancing crude sourcing as Russian imports fell to a two-year low in December 2025, lifting OPEC’s share and raising geopolitical risk concerns.

Arctic Fever: How ‘Greenland Tariff’ Politics Sparked a Global Flight to Safety

By Axel Miller | 20 Jan 2026

Greenland-linked tariff threats have injected fresh uncertainty into transatlantic trade, triggering a risk-off shift in markets and reshaping global supply chain planning.

The New Oil (Part 5): Friend-Shoring, Supply Chain Fragmentation and the Cost of Resilience

By Cygnus | 19 Jan 2026

Friend-shoring is reshaping lithium, rare earth and graphite supply chains, creating a resilience premium and new winners and losers in clean tech.

The New Oil (Part 4): Can Technology Break the Dependency?

By Cygnus | 16 Jan 2026

Can magnet recycling and rare-earth-free motors reduce global dependence on strategic minerals? Part 4 explores breakthroughs, limits and timelines.

Researchers hide information in plain text

12 May 2018

Latest articles

Global Chip Sales Expected to Hit $1 Trillion This Year, Industry Group Says

Citi to Match Government Seed Funding for Children’s ‘Trump Accounts’

Huawei-Backed Aito Partners With UAE Dealer to Enter Middle East Market

AI is No Bubble: Nvidia Supplier Wistron Sees Order Surge Through 2027

Tech Selloff Weighs on Asian Markets; Indonesia Slides After Moody’s Outlook Cut

Amazon Plans $200 Billion AI Spending Surge; Shares Slide on Investor Jitters

Server CPU Shortages Grip China as AI Boom Strains Intel and AMD Supply Chains

OpenAI launches ‘Frontier’ AI agent platform in enterprise push

Toyota set for third straight quarterly profit drop as costs and tariffs weigh

Featured articles

Server CPU Shortages Grip China as AI Boom Strains Intel and AMD Supply Chains

By Cygnus | 06 Feb 2026

Budget 2026-27 Seeks Fiscal Balance Amid Rupee Volatility and Industrial Stagnation

By Cygnus | 02 Feb 2026

The Thirsty Cloud: Why 2026 Is the Year AI Bottlenecks Shift From Chips to Water

By Axel Miller | 28 Jan 2026

The New Airspace Economy: How Geopolitics Is Rewriting Aviation Costs in 2026

By Axel Miller | 22 Jan 2026

India’s Data Center Arms Race: The Battle for Power, Cooling, and AI Real Estate

By Cygnus | 22 Jan 2026

India’s Oil Balancing Act: Refiners Rebuild Middle East Supply Lines as Russia Flows Disrupt

By Axel Miller | 21 Jan 2026

Arctic Fever: How ‘Greenland Tariff’ Politics Sparked a Global Flight to Safety

By Axel Miller | 20 Jan 2026

The New Oil (Part 5): Friend-Shoring, Supply Chain Fragmentation and the Cost of Resilience

By Cygnus | 19 Jan 2026

The New Oil (Part 4): Can Technology Break the Dependency?

By Cygnus | 16 Jan 2026

Latest articles

Global Chip Sales Expected to Hit $1 Trillion This Year, Industry Group Says

Citi to Match Government Seed Funding for Children’s ‘Trump Accounts’

Huawei-Backed Aito Partners With UAE Dealer to Enter Middle East Market

AI is No Bubble: Nvidia Supplier Wistron Sees Order Surge Through 2027

Tech Selloff Weighs on Asian Markets; Indonesia Slides After Moody’s Outlook Cut

Amazon Plans $200 Billion AI Spending Surge; Shares Slide on Investor Jitters

Server CPU Shortages Grip China as AI Boom Strains Intel and AMD Supply Chains

OpenAI launches ‘Frontier’ AI agent platform in enterprise push

Toyota set for third straight quarterly profit drop as costs and tariffs weigh