HTML file with strange characters -...

User 2939500 Photo


Registered User
2 posts

Hello!
I have a program of my own that generates an HTML code, for a page of my website.
When I upload the generated HTML file to my website, it shows � instead of 'Ç', � instead of 'Ã'. So, it shows � for any international character (Ç, Ã, É, ç, ã, é, ê, í, etc).
But if I open the file in The HTML Editor and save the file, without changing anything, without touching enter key, or any other key, just click on Save icon and then upload the file again, it shows the international characters correctly.
<meta charset='utf-8'> is used in the <HEAD>.
What is wrong with the generated file?
Thank you
User 379556 Photo


Registered User
1,635 posts

Answers in https://stackoverflow.com/questions/322 … s-expected suggest to me that The HTML Editor may be correcting the coding to match the utf-8 setting in the head code.

Perhaps a comparison in a plain text editor of the code generated by your program with the code from The HTML Editor will shed some light on the matter. I use TextPad which includes a facitlity for such comparisons, but there are probably other plain text editors which will do the same sort of thing.

Frank
User 3210806 Photo


Guest
3 posts

This happens to me too — usually it’s an encoding mismatch. If your file isn’t properly set to UTF-8 (or the charset the browser expects), you’ll see weird characters. also, if the document type is old or mismatched, that can mess things up. I’d try changing the DOCTYPE to a simpler one (like <!DOCTYPE html>) and making sure you have <meta charset="utf-8"> in your <head>.

If after that it still shows weird symbols, it might be the way coffeecup is saving the file (e.g. without BOM) or inserting “smarts” from some editor feature. checking whether those characters are already in the file before upload vs after publishing might help narrow it down.
User 122279 Photo


Senior Advisor
14,691 posts
Online Now

Have you tried to use the tool 'Search for undocumented characters? You find it in the 'Tools' menu.
Ha en riktig god dag!
Inger, Norway

My work in progress:
Components for Site Designer and the HTML Editor: https://mock-up.coffeecup.com


User 3213818 Photo


Guest
3 posts

This almost always happens because the file itself is not saved in UTF-8, even though your HTML declares UTF-8.

Your editor fixes the problem simply because when it saves the file, it re-encodes it as UTF-8, so your server finally receives a correctly encoded file.

What’s going wrong

Your program is most likely generating the HTML in ANSI / Latin-1 / Windows-1252, not UTF-8.
When the browser reads the file, it thinks it's UTF-8 (because of <meta charset='utf-8'>) but the bytes don’t match UTF-8 encoding → it shows �.

How to confirm

Open the raw file before saving it in The HTML Editor and check:

Its encoding (most editors show this at the bottom)

Whether changing it to UTF-8 fixes the issue

How to fix the generator

Ensure your generator outputs the file encoded as UTF-8, not just “with UTF-8 meta tag”.

For example:

In Python: open('file.html', 'w', encoding='utf-8')

In PHP: file_put_contents('file.html', $html); (ensure mb_internal_encoding("UTF-8"))

In Java: new OutputStreamWriter(fos, StandardCharsets.UTF_8)

In C#: File.WriteAllText(path, html, Encoding.UTF8);
User 3214215 Photo


Guest
1 post

Hi, guys,

I have a lot HTML4 files . My PC crashed and I wanna convert my HTML4 heap into HTML5 brilliance.
I made some search and decided to use CoffeeCup Free HTML Editor.
The first obstruction was the same phenomena you met.
I created the simple sample and upload it into Editor.

I upload a Russian word (on English it means "after", Russian cause it uses UTF-8, 10 bytes):
После
and convert it to digital code
for font "Tahoma"

in editor it looks as
Ïîňëå
decimal values are:
207 238 235 229
hexadecimal values are:
CF EE 0148 EB E5

for font "Lucida Sans Unicode"
hexadecimal values are:
CF EE 0148 EB E5

Russian words uses two bytes for character, but editor shows only five bytes Ïîňëå
The question is:
How it may be?
I look at Windows character map and found:
HTML Editor is using it for conversion from utf-8 coded text to sequence of chars in internal editor window.
As you can see ten hexadecimal digits transforms in to 5 Ïîňëå
I believe developers of a company can find solution to put right this fault (I hope).

Now I forced to use html5-editor.net: it is absolutely free and very easy to use.

Rake48.

Have something to add? We’d love to hear it!
You must have an account to participate. Please Sign In Here, then join the conversation.