Author Topic: ASCII Characters  (Read 19288 times)

Offline Not Myself

  • Earth
  • ***
  • Posts: 217
  • Unwanted Irritant
Re: ASCII Characters
« Reply #15 on: January 28, 2013, 09:35:29 AM »
If you have a table of codes like this one,

http://www.asciitable.com/

you can display the character by typing in the number of the character while holding down the ALT key.  For example, if I press and hold ALT while typing 234 235 236 237 238, I get Ωδ∞φε.  You can also copy and paste characters from other sources.

My best guess is that a conversion is taking place, and it is on your computer.  The same non-ASCII codes entered by me produce êëìíî.  The "ALT" method doesn't work for me, as I'm on a different OS - I wrote a quicky C program to output these character codes to a text file, then cut and paste into the browser window.

The internet - where bigfoot is real and the moon landings aren't.

Offline cjameshuff

  • Mars
  • ***
  • Posts: 326
Re: ASCII Characters
« Reply #16 on: January 28, 2013, 10:28:19 AM »
That "extended ASCII" table is one of multiple 8-bit encodings that add to ASCII, not compatible with UTF-8 and not itself part of ASCII. In particular, that looks like IBM code page 437, which at this point I would only expect to work on MS operating systems. Some browsers may assume you intend to send the exact characters you type, others may convert to Unicode.

You might be better off copying characters from a site like: http://unicodelookup.com/

Offline Bob B.

  • Jupiter
  • ***
  • Posts: 819
  • Bob the Excel Guru™
    • Rocket & Space Technology
Re: ASCII Characters
« Reply #17 on: January 28, 2013, 10:32:03 AM »
I've noticed that I can't get all the codes to work here.  For instance, the entire Greek alphabet can be displayed using numbers in the 900-range, as seen here:

http://chemistry.about.com/od/chartstables/a/htmlgreek.htm

Using the ALT+number method in the forum yields different characters then indicated in the above web page.  For instance ALT+916 yields ö instead of the letter delta.  What I've done in the past is to type the code into Word and then copy and past the letter into the forum, thus I get Δ.

I noticed last night for the first time that when I type the codes into Word on my home computer I get the same characters as displayed in the forum, but when I do it on my work computer I get the Greek letters.  My work computer has a newer version of Word, so maybe that's the reason.  (Of course neither version is exactly "new".  I use Office 2003 at work and Office 2000 at home.)

Unfortunately, nothing I do at TheSpaceRace works.  I can't get that form to display special characters no matter what I do.

Offline Bob B.

  • Jupiter
  • ***
  • Posts: 819
  • Bob the Excel Guru™
    • Rocket & Space Technology
Re: ASCII Characters
« Reply #18 on: January 28, 2013, 10:55:24 AM »
Just for reference, below is the entire Greek alphabet - typed into Word using the 900-series codes and then copied and pasted here:

Α α Β β Γ γ Δ δ Ε ε Ζ ζ Η η Θ θ Ι ι Κ κ Λ λ Μ μ Ν ν Ξ ξ Ο ο Π π Ρ ρ Σ σ ς Τ τ Υ υ Φ φ Χ χ Ψ ψ Ω ω

Offline Bob B.

  • Jupiter
  • ***
  • Posts: 819
  • Bob the Excel Guru™
    • Rocket & Space Technology
Re: ASCII Characters
« Reply #19 on: January 28, 2013, 11:00:56 AM »
Another solution is to add a Symbol font to the forum.  Is that possible?

Offline cjameshuff

  • Mars
  • ***
  • Posts: 326
Re: ASCII Characters
« Reply #20 on: January 28, 2013, 11:40:27 AM »
Another solution is to add a Symbol font to the forum.  Is that possible?

There's already LaTeX support.

\alpha \thetao \tau \beta \vartheta \pi \upsilon \gamma \gamma \varpi \phi \delta \kappa \rho \varphi \epsilon \lambda \varrho \chi \varepsilon \mu \sigma \psi \zeta \nu \varsigma \omega \eta \xi

 \Gamma \Lambda \Sigma \Psi \Deltai \Upsilon \Omega \Theta \Pi \Phi

...though the broken preview makes it a bit of a pain to use...
« Last Edit: January 28, 2013, 11:43:11 AM by cjameshuff »

Offline Not Myself

  • Earth
  • ***
  • Posts: 217
  • Unwanted Irritant
Re: ASCII Characters
« Reply #21 on: January 28, 2013, 11:44:27 AM »
Just for reference, below is the entire Greek alphabet - typed into Word using the 900-series codes and then copied and pasted here:

Α α Β β Γ γ Δ δ Ε ε Ζ ζ Η η Θ θ Ι ι Κ κ Λ λ Μ μ Ν ν Ξ ξ Ο ο Π π Ρ ρ Σ σ ς Τ τ Υ υ Φ φ Χ χ Ψ ψ Ω ω

The 900-series codes are for UTF-8, and when I cut the above text here, paste it into a text editor on my computer, and then write a quick C program to print out the decimal values of the bytes, I get that each Greek letter above is two bytes, beginning with either 206 or 207 decimal.  Haven't specifically checked, but I'm pretty confident this is UTF-8.

The early codes you linked were not UTF-8 encoding, but CP-437, and I'm surprised they worked at all.  I suspect, as cjameshuff proposed, that your browser was intelligent about it and converted the CP-437 encodings to UTF-8 encodings when shipping them off to the board.

If I may ask for a confirmation - the trouble you have at this board occurs when you try to enter the 900-series codes into a browser window, then post?

I think I understand what is happening.  When you enter the 916 code, which is supposed to specify the UTF-8 encoding for a Greek delta, you are getting an ö.  In the CP-437 encoding, this letter has the eight-bit code 148 (decimal).  In hexadecimal, 916 is 0x394, and 148 is 0x94.  I suspect that is not coincidence.  So you are entering the code for the UTF-8 encoding of a Greek delta, but your browser thinks you want CP-437, and misinterprets your input (also throwing away the "3" digit, since CP-437 codes are all two hexadecimal digits).

So I would have a look around all the browser settings, to see how you have "encoding" or something similar set.  I suspect it is set for "CP-437", "US", or something like that.  If so, and you have a UTF-8 option, try changing to that, and see if it works any better.

« Last Edit: January 28, 2013, 11:47:39 AM by Oxyartes »
The internet - where bigfoot is real and the moon landings aren't.

Offline Not Myself

  • Earth
  • ***
  • Posts: 217
  • Unwanted Irritant
Re: ASCII Characters
« Reply #22 on: January 28, 2013, 11:46:19 AM »
Let's see if this works.

\int_{-\infty}^{+\infty}\frac{1}{\sqrt{2\pi}}e^{-\frac{u^{2}}{2}}d u=1
The internet - where bigfoot is real and the moon landings aren't.

Offline Bob B.

  • Jupiter
  • ***
  • Posts: 819
  • Bob the Excel Guru™
    • Rocket & Space Technology
Re: ASCII Characters
« Reply #23 on: January 28, 2013, 12:00:16 PM »
If I may ask for a confirmation - the trouble you have at this board occurs when you try to enter the 900-series codes into a browser window, then post?

Correct.  It's when I'm in a reply text box in the Browser.

Quote
So I would have a look around all the browser settings, to see how you have "encoding" or something similar set.  I suspect it is set for "CP-437", "US", or something like that.  If so, and you have a UTF-8 option, try changing to that, and see if it works any better.

Thanks.  I'll look into that when I get an opportunity.

Offline grmcdorman

  • Venus
  • **
  • Posts: 85
Re: ASCII Characters
« Reply #24 on: January 28, 2013, 01:46:30 PM »
In Firefox, the default encoding is in Options|Content, in the dialog box shown by Advanced... under Fonts & Colors.

However, the server can also specify, in the headers, the character set it is using. From a quick web search, user agents (that is, browsers) can choose to return POST content using the same value. It is also possible to explicitly specify the character set to be used in a FORM (i.e. the type-in boxes for posting messages).

Inspecting the content of this page, the Quick Reply box does just that; see the following. Note the accept-charset="UTF-8". That means that, at least for that box, the browser must send UTF-8 to the server.
Code: [Select]
<form action="http://www.apollohoax.net/forum/index.php?action=quickmod2;topic=336.15" method="post" accept-charset="UTF-8" name="quickModForm" id="quickModForm" style="margin: 0;" onsubmit="return oQuickModify.bInEditMode ? oQuickModify.modifySave('ff2e0a69ff2041e702688f31cd533f0f', 'adc929d5') : false">
For reference, here are the headers supplied by this server; note that the character set is UTF-8.
Code: [Select]
HTTP/1.1 200 OK
Date: Mon, 28 Jan 2013 18:39:48 GMT
Server: Apache/2.2.22 (Unix) mod_ssl/2.2.22 OpenSSL/0.9.8e-fips-rhel5 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635 mod_fcgid/2.3.5
X-Powered-By: PHP/5.3.15
Pragma: no-cache
Cache-Control: private
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Set-Cookie: PHPSESSID=a1caf8341533381e6b1f79998bce9669; path=/
Last-Modified: Mon, 28 Jan 2013 18:39:48 GMT
Connection: close
Content-Type: text/html; charset=UTF-8

ETA: Bob B., what browser are you using? (product, e.g. IE, and version, e.g. 9).

Offline Bob B.

  • Jupiter
  • ***
  • Posts: 819
  • Bob the Excel Guru™
    • Rocket & Space Technology
Re: ASCII Characters
« Reply #25 on: January 28, 2013, 02:51:33 PM »
Bob B., what browser are you using? (product, e.g. IE, and version, e.g. 9).

At work I'm using IE8.  I don't remember what I'm using at home, but probably the same.

Offline Bob B.

  • Jupiter
  • ***
  • Posts: 819
  • Bob the Excel Guru™
    • Rocket & Space Technology
Re: ASCII Characters
« Reply #26 on: January 28, 2013, 02:59:36 PM »
So I would have a look around all the browser settings, to see how you have "encoding" or something similar set.  I suspect it is set for "CP-437", "US", or something like that.  If so, and you have a UTF-8 option, try changing to that, and see if it works any better.

I just found "Encoding" listed under the "View" menu.  It is currently set to "Unicode (UTF-8)".

Offline Not Myself

  • Earth
  • ***
  • Posts: 217
  • Unwanted Irritant
Re: ASCII Characters
« Reply #27 on: January 28, 2013, 09:30:01 PM »
Very strange.

I think there are two distinct phenomena at work here.  One is an issue with the way the other board is set up (this is a conjecture, not proved).  The other has to do with the way your computer is set up.

This page

http://en.wikipedia.org/wiki/Alt_code

suggests that a registry hack is needed in Windows to get the Unicode (the 900-series) codes to work.  It also explains how the CP-437 codes (which you seem able to use successfully here, but not at the other board) get converted to UTF-8 - this seems to have been a deliberate measure by Microsoft to maintain compatibility with what had become a popular input method.

So I think that explains pretty much all the behaviour you see at this board, except that you are able to use the 900-series codes in Word.  I wonder if specific applications have the ability to access/override the normal key handling methods, and the new version of Word has chosen to do this.

Regarding the other board, I have been able to replicate the behaviour you describe in PMs to myself - the Greek letters (entered in Unicode) look fine in the preview, then become question marks in the final version.  I suspect (no proof whatsoever) that the final "post" fails to specify (or specifies incorrectly) the encoding, so the nice Unicode characters in the preview are forced to be converted to CP-437 or Windows-1252 or something like that, and they just get killed instead.

So if you are willing to futz around with your registry, you could enable the Unicode alt-codes (like the 900-series).  It sounds like you could still use the CP-437 codes (the ones from 128-255), just pick whichever is most convenient in a given situation.  I think this would probably eliminate the need to go through Microsoft Word, and would get the 900-series codes working on your other computer as well, at least at this board.

I suspect the issue at the other board is beyond your control.  I'll see if I can have a look at the page source, and work out what the encoding is.
The internet - where bigfoot is real and the moon landings aren't.

Offline LunarOrbit

  • Administrator
  • Jupiter
  • *****
  • Posts: 690
    • ApolloHoax.net
Re: ASCII Characters
« Reply #28 on: January 28, 2013, 09:49:47 PM »
I think there are two distinct phenomena at work here.  One is an issue with the way the other board is set up (this is a conjecture, not proved).

This is probably true, but I can't figure out what the difference is. Both forums are using the same software, on the same server. TheSpaceRace.com is over a decade old now, and has gone through many software updates, so I'm thinking the database might be corrupt. ApolloHoax.net, on the other hand, is less than a year old (in it's current incarnation) and has a "fresher" database. It's only had a few minor updates installed.

The weird thing is that I can copy & paste Bob's greek characters into the forum, hit preview and see the characters fine... it's only after I have saved the post that the characters get converted into question marks. So I think the MySQL database is doing the character conversion, not the forum software.

Quote
The other has to do with the way your computer is set up.

Maybe, but like I said, I can reproduce the same problem that Bob is experiencing, so I don't think it's related to his computer.

Quote
I'll see if I can have a look at the page source, and work out what the encoding is.

The forum software is configured to use UTF-8 and the template also has UTF-8 declared in the header:

Code: [Select]
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
That is another strange thing about the problem. Both forums are not only using the same software, they're both using modified versions of the same template.
« Last Edit: January 28, 2013, 10:06:36 PM by LunarOrbit »
It suddenly struck me that that tiny pea, pretty and blue, was the Earth.
I put up my thumb and shut one eye, and my thumb blotted out the planet Earth.
I didn't feel like a giant. I felt very, very small.
- Neil Armstrong (1930-2012)

Offline Not Myself

  • Earth
  • ***
  • Posts: 217
  • Unwanted Irritant
Re: ASCII Characters
« Reply #29 on: January 28, 2013, 10:02:58 PM »
Quote from: LunarOrbit link=topic=336.msg11091#msg11091
Quote
The other has to do with the way your computer is set up.

Maybe, but like I said, I can reproduce the same problem that Bob is experiencing, so I don't think it's related to his computer.

Yes, that's right, I did not mean to suggest there was something idiosyncratic about his computer - it appears to be the default way Windows computers work, not to accept the Alt-key codes for UTF-8.

The really weird thing is that he reports these codes do work in MS Word.  I guess that software must have its own key-handling code that overrides the system.
The internet - where bigfoot is real and the moon landings aren't.