Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Wiki Markup
There are several national variants of ASCII. Therefore, the original ASCII is often referred as *US-ASCII*. The international standard _ISO 646_ defines a character set similar to US-ASCII, but with code positions corresponding to US-ASCII characters {{@\[\]\{\|\}}} as "national use positions". It also gives some liberties with characters {{#$^`~\#$^`\~}}. In _ISO 646_, several "national variants of ASCII" have been defined, assigning different letters and symbols to the "national use" positions. Thus, the characters that appear in those positions - including those in *US-ASCII* are somewhat "unsafe" in international data transfer. Thus, due to the "national variants," some characters are less "safe" than others--they might be transferred or interpreted incorrectly.

...

Code Block
bgColor#ffcccc
#include <fcntl.h> 
#include <sys/stat.h> 
 
int main(void) { 
   char *file_name = "&#xBB;&#xA3;???&#xAB;"; 
   mode_t mode = S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH; 
 
   int fd = open(file_name, O_CREAT | O_EXCL | O_WRONLY, mode); 
   if (fd == -1) { 
      /* Handle Error */ 
   }  
} 

An implementation is free to define its own mapping of the non-"safe" characters. For example, when tested on a Red Hat Linux distribution, the following filename resulted:

...

Code Block
bgColor#ccccff
#include <fcntl.h> 
#include <sys/stat.h> 
 
int main(void) { 
   char *file_name = "name.ext"; 
   mode_t mode = S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH; 
 
   int fd = open(file_name, O_CREAT | O_EXCL | O_WRONLY, mode); 
   if (fd == -1) { 
      /* Handle Error */ 
   }  
} 

Risk Assessment

Failing to use only the subset of ASCII guaranteed to work can result in misinterpreted data.

...

Wiki Markup
\[[Kuhn 06|AA. C References#Kuhn 06]\] UTF-8 and Unicode FAQ for Unix/Linux
\[[ISO/IEC 646-1991|AA. C References#ISO/IEC 646-1991]\] ISO 7-bit coded character set for information interchange
\[[ISO/IEC 9899-1999|AA. C References#ISO/IEC 9899-1999]\] Section 5.2.1, "Character sets"
\[[MISRA 04|AA. C References#ISO/MISRA 04]\] Rule 3.2, "The character set and the corresponding encoding shall be documented," and Rule 4.1, "Only those escape sequences that are defined in the ISO C standard shall be used"