HTML file with strange characters -...
Hello!
I have a program of my own that generates an HTML code, for a page of my website.
When I upload the generated HTML file to my website, it shows � instead of 'Ç', � instead of 'Ã'. So, it shows � for any international character (Ç, Ã, É, ç, ã, é, ê, í, etc).
But if I open the file in The HTML Editor and save the file, without changing anything, without touching enter key, or any other key, just click on Save icon and then upload the file again, it shows the international characters correctly.
<meta charset='utf-8'> is used in the <HEAD>.
What is wrong with the generated file?
Thank you
I have a program of my own that generates an HTML code, for a page of my website.
When I upload the generated HTML file to my website, it shows � instead of 'Ç', � instead of 'Ã'. So, it shows � for any international character (Ç, Ã, É, ç, ã, é, ê, í, etc).
But if I open the file in The HTML Editor and save the file, without changing anything, without touching enter key, or any other key, just click on Save icon and then upload the file again, it shows the international characters correctly.
<meta charset='utf-8'> is used in the <HEAD>.
What is wrong with the generated file?
Thank you
Answers in https://stackoverflow.com/questions/322 … s-expected suggest to me that The HTML Editor may be correcting the coding to match the utf-8 setting in the head code.
Perhaps a comparison in a plain text editor of the code generated by your program with the code from The HTML Editor will shed some light on the matter. I use TextPad which includes a facitlity for such comparisons, but there are probably other plain text editors which will do the same sort of thing.
Frank
Perhaps a comparison in a plain text editor of the code generated by your program with the code from The HTML Editor will shed some light on the matter. I use TextPad which includes a facitlity for such comparisons, but there are probably other plain text editors which will do the same sort of thing.
Frank
This happens to me too — usually it’s an encoding mismatch. If your file isn’t properly set to UTF-8 (or the charset the browser expects), you’ll see weird characters. also, if the document type is old or mismatched, that can mess things up. I’d try changing the DOCTYPE to a simpler one (like <!DOCTYPE html>) and making sure you have <meta charset="utf-8"> in your <head>.
If after that it still shows weird symbols, it might be the way coffeecup is saving the file (e.g. without BOM) or inserting “smarts” from some editor feature. checking whether those characters are already in the file before upload vs after publishing might help narrow it down.
If after that it still shows weird symbols, it might be the way coffeecup is saving the file (e.g. without BOM) or inserting “smarts” from some editor feature. checking whether those characters are already in the file before upload vs after publishing might help narrow it down.
Have you tried to use the tool 'Search for undocumented characters? You find it in the 'Tools' menu.
Ha en riktig god dag!
Inger, Norway
My work in progress:
Components for Site Designer and the HTML Editor: https://mock-up.coffeecup.com
Inger, Norway
My work in progress:
Components for Site Designer and the HTML Editor: https://mock-up.coffeecup.com
This almost always happens because the file itself is not saved in UTF-8, even though your HTML declares UTF-8.
Your editor fixes the problem simply because when it saves the file, it re-encodes it as UTF-8, so your server finally receives a correctly encoded file.
What’s going wrong
Your program is most likely generating the HTML in ANSI / Latin-1 / Windows-1252, not UTF-8.
When the browser reads the file, it thinks it's UTF-8 (because of <meta charset='utf-8'>) but the bytes don’t match UTF-8 encoding → it shows �.
How to confirm
Open the raw file before saving it in The HTML Editor and check:
Its encoding (most editors show this at the bottom)
Whether changing it to UTF-8 fixes the issue
How to fix the generator
Ensure your generator outputs the file encoded as UTF-8, not just “with UTF-8 meta tag”.
For example:
In Python: open('file.html', 'w', encoding='utf-8')
In PHP: file_put_contents('file.html', $html); (ensure mb_internal_encoding("UTF-8"))
In Java: new OutputStreamWriter(fos, StandardCharsets.UTF_8)
In C#: File.WriteAllText(path, html, Encoding.UTF8);
Your editor fixes the problem simply because when it saves the file, it re-encodes it as UTF-8, so your server finally receives a correctly encoded file.
What’s going wrong
Your program is most likely generating the HTML in ANSI / Latin-1 / Windows-1252, not UTF-8.
When the browser reads the file, it thinks it's UTF-8 (because of <meta charset='utf-8'>) but the bytes don’t match UTF-8 encoding → it shows �.
How to confirm
Open the raw file before saving it in The HTML Editor and check:
Its encoding (most editors show this at the bottom)
Whether changing it to UTF-8 fixes the issue
How to fix the generator
Ensure your generator outputs the file encoded as UTF-8, not just “with UTF-8 meta tag”.
For example:
In Python: open('file.html', 'w', encoding='utf-8')
In PHP: file_put_contents('file.html', $html); (ensure mb_internal_encoding("UTF-8"))
In Java: new OutputStreamWriter(fos, StandardCharsets.UTF_8)
In C#: File.WriteAllText(path, html, Encoding.UTF8);
Hi, guys,
I have a lot HTML4 files . My PC crashed and I wanna convert my HTML4 heap into HTML5 brilliance.
I made some search and decided to use CoffeeCup Free HTML Editor.
The first obstruction was the same phenomena you met.
I created the simple sample and upload it into Editor.
I upload a Russian word (on English it means "after", Russian cause it uses UTF-8, 10 bytes):
После
and convert it to digital code
for font "Tahoma"
in editor it looks as
Ïîňëå
decimal values are:
207 238 235 229
hexadecimal values are:
CF EE 0148 EB E5
for font "Lucida Sans Unicode"
hexadecimal values are:
CF EE 0148 EB E5
Russian words uses two bytes for character, but editor shows only five bytes Ïîňëå
The question is:
How it may be?
I look at Windows character map and found:
HTML Editor is using it for conversion from utf-8 coded text to sequence of chars in internal editor window.
As you can see ten hexadecimal digits transforms in to 5 Ïîňëå
I believe developers of a company can find solution to put right this fault (I hope).
Now I forced to use html5-editor.net: it is absolutely free and very easy to use.
Rake48.
I have a lot HTML4 files . My PC crashed and I wanna convert my HTML4 heap into HTML5 brilliance.
I made some search and decided to use CoffeeCup Free HTML Editor.
The first obstruction was the same phenomena you met.
I created the simple sample and upload it into Editor.
I upload a Russian word (on English it means "after", Russian cause it uses UTF-8, 10 bytes):
После
and convert it to digital code
for font "Tahoma"
in editor it looks as
Ïîňëå
decimal values are:
207 238 235 229
hexadecimal values are:
CF EE 0148 EB E5
for font "Lucida Sans Unicode"
hexadecimal values are:
CF EE 0148 EB E5
Russian words uses two bytes for character, but editor shows only five bytes Ïîňëå
The question is:
How it may be?
I look at Windows character map and found:
HTML Editor is using it for conversion from utf-8 coded text to sequence of chars in internal editor window.
As you can see ten hexadecimal digits transforms in to 5 Ïîňëå
I believe developers of a company can find solution to put right this fault (I hope).
Now I forced to use html5-editor.net: it is absolutely free and very easy to use.
Rake48.
Have something to add? We’d love to hear it!
You must have an account to participate. Please Sign In Here, then join the conversation.