isspace(), strings, and unicode



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 08/02/2003 at 00:43, xxxxxxxx wrote:

    User Information:
    Cinema 4D Version:   8.012 
    Platform:   Windows  ;   
    Language(s) :   C.O.F.F.E.E  ;

    ---------
    I'm reading a file one character at a time (for parsing out tokens).  Being new to COFFEE, I have several questions which could not be answered by the documentation (experimentation is killing me - modifying, copying, opening C4D, checking, closing C4D, repeat).
    1. How does one read single characters from a text file other than using BaseFile->ReadString(1, GE_XBIT)?  The (Unicode) character read must be convertable to a string for concatenation with stradd() in order to "build" the token.
    2. What exactly is considered a space by isspace()?  Are tabs, spaces, newlines all considered in the check?
    3. Following (2.), if isspace() only considers spaces or spaces and tabs, how does one check for the newline '\n' character?  What is its Unicode value?  Does COFFEE understand "\n" as a valid string?  So far in testing, it appears not.
    The differences between COFFEE and C++/Java have not been enumerated thoroughly.  My knowledge of C/C++/Java are extensive (15+ years), which is making me feel more confused and inept as I can only fallback on that knowledge when gaps in information are present and find no correlations.
    Thanks and please help,
    Robert Templeton



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 08/02/2003 at 02:05, xxxxxxxx wrote:

    Quote: Originally posted by kuroyume0161 on 08  February 2003
    >
    > * * *
    >
    > 1. How does one read single characters from a text file other than using BaseFile->ReadString(1, GE_XBIT)?  The (Unicode) character read must be convertable to a string for concatenation with stradd() in order to "build" the token.
    You can use the other ReadX() functions, but then you'd have to handle the unicode conversion yourself. I don't understand what problem you have with ReadString() and stradd().
    > 2. What exactly is considered a space by isspace()?  Are tabs, spaces, newlines all considered in the check?
    Most of the string functions in C.O.F.F.E.E. are equivalent to their ANSI C namesakes. The C isspace(c) returns TRUE if c is in 0x09–0x0D or 0x20. The PC \n newline (0x0A) is included in this range.
    > 3. Following (2.), if isspace() only considers spaces or spaces and tabs, how does one check for the newline '\n' character?  What is its Unicode value?  Does COFFEE understand "\n" as a valid string?  So far in testing, it appears not.
    "\n" should be a valid string. However, a better way is to use GeGetLineEnd() to get a platform specific line ending. (You should be prepare to handle mac files on a pc and vice versa though, so the best is probably to check for all combinations of 0x0A and 0x0D.)



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 08/02/2003 at 02:38, xxxxxxxx wrote:

    Hi, Mikael,
    I don't have much of a problem with ReadString() and stradd(), but as each character is read in (as a string in this case), "\n", tabs, and spaces must be checked to either skip leading whitespace or end the token.
    As it goes, doing a strcmp(inputchar, "\n") (where inputchar is a string of length=1) doesn't seem to be working.  isspace() takes an <i>int</i> and I have no idea how COFFEE converts strings into int values representing characters (or vice versa).  So, if I'm reading in length=1 strings, how do I use isspace([int])?
    What I guess I'm trying to do is avoid mixing modes here - strings and [int] chars - unless there is a way to convert back and forth (?).
    I see how strchr() might work as long as I check for, as you wrote, all combinations of 0x0A and 0x0D (which are equivalent ASCII and Unicode, correct?).
    Let me know if any of this makes sense and if there is a conversion process.
    Thanks!
    Robert Templeton



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 08/02/2003 at 02:42, xxxxxxxx wrote:

    One more thing:
    I take it that there is no way to index strings, for example:

        
        
        
        
        var mystring = "Yo, world!";
        
        
        
        
        var character = mystring[4];
        
        
        
        
        println("result: ", character);
        
        
        
    

    result: w
    With indexing, I could just about anything :)
    Robert Templeton



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 08/02/2003 at 03:04, xxxxxxxx wrote:

    This seems to work nicely:

        
        
        
        
        TokenReader::IsSpace(onechar)  
        {  
         // PC uses CR+LF (0x000D+0x000A)  
         // Mac uses CR (0x000D)  
         // Tab, Space, CR, LF  
         return ((0 == strchr(onechar,  0x0009)) || (0 == strchr(onechar, 0x0020)) || (0 == strchr(onechar, 0x000D)) || (0 == strchr(onechar, 0x000A)));  
        }
        
        
        
    

    The good thing is that it doesn't matter which 'newline' characters are used, since they are all stripped away, albeit one at a time.
    It's amazing that this is becoming clearer as I'm working around 4 AM (04:00), being awake since 7:30 AM.  See, I can program in my sleep! But not as well when I'm fully awake. ;)
    Robert Templeton



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 08/02/2003 at 03:14, xxxxxxxx wrote:

    Mikael,
    Thanks for being patient with me and for your valuable assistance so far. :)
    Despite my experience, it is always a mess learning a new programming language (especially when it's an SDK for an application), even if a close derivative to languages in which one is already fluent.
    Figured that it would be a struggle at each new juncture, so expect me to blurt in here at each one with a puzzled look of confusion.
    Robert Templeton


Log in to reply