Parsing text files



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 17/10/2007 at 23:06, xxxxxxxx wrote:

    User Information:
    Cinema 4D Version:   10.111 
    Platform:      Mac OSX  ; 
    Language(s) :     C++  ;

    ---------
    I have a text file that I'm trying to use to set some render settings.

    For example, a few lines of my file are:

    Resolution X=320
    Resolution Y=240
    Film Format X=320
    Film Format Y=240

    I'd like to be able to break these lines down into 2 sets of strings, one for Setting and the other for Value.

    I thought I had it working, but something is amiss. The code works when I'm trying to print out the whole file, but when I insert the test for a specific string in the file, it doesn't work. Any help or guidance would be appreciated.

    Here's what I've tried:

    LONG i = 0;
    String singleCharacter;
    String line = " ";
    VLONG fileLength = bFile->GetLength();
    CHAR c;
    CHAR* pc = &c;

    while(i < fileLength)
    {
       bFile->ReadChar(pc);
       singleCharacter = pc;
       if(singleCharacter != "\n")
       {
         line = (line + pc);
         if(line == "Resolution X=") GePrint("SUCCESS!");
       }

    else
       {
         line = " ";
       }

    i++;

    }
          
    Having this line in there causes problems, but I don't think it is the actual source of the problem:
    if(line == "Resolution X=") GePrint(SUCCESS!);



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 18/10/2007 at 00:23, xxxxxxxx wrote:

    Whitespace (usually space, tab, or any character equal to or less than 32 ASCII) is your friend and use tokenization with it. A better format would be:

    ResolutionX 320(End-of-Line)
    FilmFormatX 320(End-of-Line)

    Read the entire line into a String first. After you read the line, you need to get tokens (words separated by whitespace). Now you need to see what tokens it has. The first token, "ResolutionX" gives you the parameter. The second token "320" gives you the parameter's value. This makes it easier to encapsulate the format parts.

    As a general rule, don't check for tokens while reading the line. Read the line and check for tokens (and respond or store the tokens for later response). I'd supply my code but I have two separate classes for this procedure: FileReader and StringTokenizer. And they're a bit complex for the circumstances being dealt with.

    _String singleChar;
    String line = "";
    String token;
    VLONG fileLength = bFile- >GetLength();
    CHAR c;
    CHAR* pc = &c;
    LONG pos;
    LONG resX;

    for (VLONG i = 0L; i != fileLength; ++i)
    {
         bFile->ReadChar(pc);
         singleChar = pc;
         // Line read from the file
         if (singleChar == "\n")
         {
              // Get first token - delimited by space
              if (line.FindFirst(' ', &pos;))
              {
                   token = line.SubStr(0L, pos);
                   if (token == "ResolutionX")
                   {
                        // Get second token
                        ++pos;
                        token = line.SubStr(pos, line.GetLength()-pos);
                        resX = token.StringToLong();
                   }
              }
              line = "";
         }
         // Add character to string
         else
         {
              line += singleChar;
         }
    }_

    Couple of notes:

    * Since this was written off the culf, it may require some finess.

    * Note that the correct initialization of line is "" and not " " which already introduces a spurious space character. For the most part, Strings are initialized to zero characters (i.e.: "");

    * Note that this is only good for a list of "parameter value" lines strictly. That is, it is not a general parser. Tabs as whitespace, initial or ending whitespace, multiple "parameter values" per line will confound this.



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 18/10/2007 at 08:14, xxxxxxxx wrote:

    Thank you very much, Robert. I'll try it out tonight and see if it works for me.

    I don't know if I can get rid of the whitespace, though, in my text file. Some settings, like save paths and object names are very likely to have some whitespace in them (even Maxon uses whitespace in its folder structure). It looks like I can use FindFirst to use the "=" to break apart the string. Is that right? If so, I can also substitute tab for =, then, which would work better. You can't use tabs in object names or save paths, so it is a safe character.



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 18/10/2007 at 09:23, xxxxxxxx wrote:

    You can use "=" as a delimiter. Note that the second token (for the parameter value) is just from after the space to the end of the string. So filepaths and names with spaces won't affect it.

    Filepath C:\Program Files\Maxon\Cinema 4D R10\

    should work. If you are going to allow multiple parameter values, say a vector, you'll need to do some sub-tokenization. Get the parameter value token and then break it up. So if you have:

    Position 10 20 30

    "10 20 30" is the second token but you want to extract each value individually, say, to convert to a LONG or Real value. Tracking the current location within the string with 'pos', you can sequentially grab each token with FindFirst(' ', &pos;, start) where 'start' is 1 character after the previous delimiter position (pos+1).



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 18/10/2007 at 14:33, xxxxxxxx wrote:

    Hi!

    I'm having the same problem. I want to read some infos from a txt-file. My code looks similar, but I am having trouble with finding the newline.
    I generated a txt-file with some words and newlines and my code can't find the "\n". I tried your code as well, but there is the same problem. Is txt perhaps the wrong format or do I have to do some other settings before I can read my file?

    Another problem is that when I print the content of my char-variable, there are always 3 additional symbols which aren't in my file. I also tried your code for this, same output.
    The only solutions seems to be initializing the char-variable with char test[2]= " ".
    If I generate a var like this: char* test; Cinema 4d crashes.

    Do you know what I am doing wrong and how to find a newline? Isn't there another possibility to read a file with a huge content without reading every char and testing if this is a newline or not? Reading Strings with ReadString() forces also a crash of Cinema 4d.

    I hope it's okay that I use "your" thread for a very similar problem. :)

    Nice greetings from Germany,
    Manuel



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 18/10/2007 at 19:24, xxxxxxxx wrote:

    Okay, I found a solution for my problem.

    char test[2] = " ";
    bfile->ReadBytes(test,1);
    singleCharacter = test;

    And in this singleCharacter I can find a "\n", perhaps because of conversion from char to String?!

    Another time sorry for using your thread.



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 18/10/2007 at 19:33, xxxxxxxx wrote:

    Isn't there another possibility to read a file with a huge content without reading every char and testing if this is a newline or not?

    Nope. Yep. You can read a file char by char or you can use C or STL to read lines (fgets() or fstream). The problem with reading lines is that you need a buffer big enough for any situation. For instance, people sometimes strip line endings from source code text files to reduce size (though this is not required these days). You'll end up reading the entire file into a single line buffer.

    I do it char by char using BaseFile::ReadChar() because I want to avoid any implementation differences between VisualStudio, CodeWarrior, and Xcode in interpreting bytes with fgets() for instance - i.e.: endian, end-of-line (different on MacOS, Windows, and Linux), etc.

    Whatever you are more comfortable with and works for you.

    Reading Strings with ReadString() forces also a crash of Cinema 4d.

    That's because ReadString() is for reading C4D Strings written with WriteString(). These are not char* or the STL String class. These are C4D String classes which save in a particular format (4 bytes number-of-characters followed by the string and a null-terminator - see BaseFile::ReadString() in Resource:_api:c4d_file.cpp).



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 18/10/2007 at 19:41, xxxxxxxx wrote:

    Quote: Originally posted by Shaden1 on 18 October 2007
    >
    > * * *
    >
    > Okay, I found a solution for my problem.
    >
    > char test[2] = " ";
    > bfile- >ReadBytes(test,1);
    > singleCharacter = test;
    >
    > And in this singleCharacter I can find a "\n", perhaps because of conversion from char to String?!
    >
    > Another time sorry for using your thread.
    >
    >
    > * * *

    I'd be careful there. It is getting "\n" only because you are on MacOS. Ask how I know this. ;) Windows uses 0x0D+0x0A for end-of-lines (\n). MacOS on the other hand uses 0x0D only. For Windows, you'll need to read two bytes to get past the newline. It is possible just to look for 0x0D but then you have to be prepared for a possible 0x0A. This can happen on either OS.

    That aside, the other problem I see you have just corrected - a char array forming a char* (string) needs to be null-terminated (end with a byte = 0). If you were just getting one character in a one element array and then trying to convert that into a string, you should see why the problems. All strings need to be ABCDEFG\0 and this counts for char arrays being treated as strings as well. Always make the char array one bigger than the expected maximum number of characters and set the last unfilled element to 0 after setting the 'string':

    char test[8];
    test[0] = 'H';
    test[1] = 'i';
    test[2] = '!';
    test[3] = 0;



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 19/10/2007 at 02:52, xxxxxxxx wrote:

    I am sorry, but I am not working with MacOS. It seems to work with windows too, but I have to think about your additional comments.

    Now I understand why ReadString crashes...should perhaps be documented in the SDK-Documentation, because "Read a string from the file." can be everything for me. ;)

    And because I didn't add a 0 at the end, there was always trash in my converted string, right? I will test this later, thank you.

    With ReadChar, I am not able to find "\n", but I don't know what I am doing wrong. ReadBytes causes problems with MacOS then, I know. But the first think is, that my Plugin is running. All these things will be corrected at the end. :)

    Thank you very much for your replies. You are really great!

    Nice greetings,
    Manuel



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 19/10/2007 at 11:39, xxxxxxxx wrote:

    Hi Manuel,

    Don't worry. You're not hijacking the thread at all. As long as we're both trying to solve the same problem, it really should be contained in one thread.

    About ReadChar, I believe it won't detect "\n" because that is a string value, not an ASCII character value.

    Here's a character table for ASCII values.

    http://www.asciitable.com/

    I have only a basic understanding of this stuff, so please, someone correct me if I'm wrong.



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 19/10/2007 at 12:44, xxxxxxxx wrote:

    You can always check the character representation: '\n'. But as noted, I wouldn't completely trust that. A Windows build will expect the two character end-of-line and if the file was created on a Mac, it will be interpreted incorrectly (as a Carriage Return CR). Simply load any Mac text file into NotePad and note the result. This is because Windows isn't finding CR+LF, only CR.

    As I explained, the best way around this is to ReadChar() and check for 0x0D (CR). That will definitively denote an end-of-line (except for Unix where LF is used - but that should be rare if ever). To be certain though, you have to go one more character to see if there is a 0x0A (LF) for potential Windows files - otherwise it will be added to the next line/token read and wreak havoc.

    Here's my FileReader::ReadLine() method. Note that I check for either CR or LF (this supports Windows, MacOS, Unix). If there are two characters representing the end-of-line, note that leading 'whitespace' is always skipped. That includes CR and LF as their numerical values are less than 32 (Space). So the next call to ReadLine() skips any spurious end-of-line characters remaining - including empty lines.

    Note that I use a file buffer. This is a large buffer that holds as much of the file as possible - FillBuffer() uses ReadBytes() to bring the file in clumps. It is done this way since I'm dealing with text files of many megabytes (hundreds even). Note that it also simplifies the line reading process to avoid ReadChar() continuously.

    // Read a line from file (up to EOL or EOF),
    // skipping leading whitespace, even blank lines
    //*---------------------------------------------------------------------------*
    CHAR* FileReader::ReadLine()
    //*---------------------------------------------------------------------------*
    {
         // Check for ESC key (abort load)
         if (GetInputEvent(BFM_INPUT_KEYBOARD, keyinput) && (keyinput.GetLong(BFM_INPUT_CHANNEL) == KEY_ESC))
         {
              abort = TRUE;
              return NULL;
         }

    // Step 1: Skip leading whitespace
         fbuf =          fbufptr;
         do {
              // Reached end of file buffer, read more
              if ((fbufptr == fbufend) && !FillBuffer())     return NULL;

    c = *fbufptr;
              ++fbufptr;
         } while (c <= UNICODE_SPACE);
         bytesRead +=     (fbufptr-fbuf);

    // Step 2: Read line into lbuffer until EOL (or EOF)
         fbuf =          fbufptr;
         lbufptr =     lbuffer;
         do {
              // Buffer overflow - line equal to or longer than BUFFER_SIZE
              if (lbufptr == lbufend)     return (CHAR* )ErrorException::NullThrow(EE_DIALOG, GeLoadString(IPPERR_LINETOOLONG_TEXT), filename.GetString(), GetLineString());

    // Store character into line buffer
              *lbufptr = c;
              ++lbufptr;

    // Reached end of file buffer, read more
              if ((fbufptr == fbufend) && !FillBuffer())     return NULL;

    c = *fbufptr;
              ++fbufptr;
         // - PC uses CR+LF (0x000D+0x000A), Mac uses CR (0x000D)
         } while ((c != UNICODE_CR) && (c != UNICODE_LF));

    bytesRead +=     (fbufptr-fbuf);
         // Set Status Bar Progression
         StatusSetBar(bytesRead / statusConstant);
         *lbufptr = 0;
         return lbuffer;
    }



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 13/02/2008 at 11:25, xxxxxxxx wrote:

    Hi Guys,
    I am currently writing a plug that needs to read a simple text file. This thread has been a ton of help, but I still have a problem when converting a CHAR to a String. Whenever I assign a char to a string, Cinema seems to append about 8 characters that look like a one '1' to the end of each character. Here's the code:
    <CODE>
    Bool HybridDopey::Command(LONG id, const BaseContainer &msg;)
    {
         String *dopeStr = NULL;
         String singleChar = "", line = "", token;

    CHAR c;
         CHAR *pc = &c;

    LONG pos;
         switch (id)
         {
              case IDC_HD_SETUP_BUTTON:
                   if(!dopeSheetFileGUI) return TRUE;

    //get the file name from the filename GUI
                   dopeSheetFile = dopeSheetFileGUI->GetData().GetValue().GetFilename();
                  
                   if(!file) return TRUE;

    //open the file for reading
                   if(!file->Open(dopeSheetFile)) return TRUE;

    //get the length of the file
                   VLONG fileLen = file->GetLength();

    GePrint("Entering For Loop");
                   for(VLONG i = 0L; i != fileLen; i++)
                   {
                        if(file->ReadChar(pc))
                             //GePrint("Read Char");
                        singleChar = String(pc, St8bit);

    GePrint(singleChar);

    //line read from the file
                        if(*pc == 'CR' || *pc == ',')
                        {
                             GePrint("Found End of Line");
                             //get the first token
                             if(line.FindFirst("", &pos;))
                             {
                                 token = line.SubStr(0L, pos);
                                 GePrint("Found Token");
                             }
                        }else{
                        line += singleChar;
                        
                        }
                   }
              
                   GePrint("The Final Line is: " + line);
                   GePrint(token);

    file->Close();

    break;

    }

    return TRUE;
    }[/CODE]

    Also, the text file that I am reading only has the word "test," in it.

    Thanks in advance for any help.
    Josh



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 13/02/2008 at 14:40, xxxxxxxx wrote:

    Just as some extra info: This simple code gives me a total of 40 characters when I assign the CHAR array to a string (5 characters * 8);

    > \> CHAR w[5] = { 'w', 'h', 'a', 't', '!' }; \> String wStr = w; \> LONG ogStringLen = wStr.GetLength(); \> \> GePrint(LongToString(ogStringLen)); \>

    This will output 40 to the console, which makes ABSOLUTELY no sense to me! The only way to get the string to behave as I would expect it to is to add the following code:

    > \> \> LONG stringLen = wStr.GetCString(w, wStr.GetCStringLen(StXbit)+1); \> wStr.Delete(5, stringLen - 5); \>

    But performing this while parsing a text file is going to make the plugin very slow. Does any one else have this problem when assigning a char to a string?

    Josh



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 13/02/2008 at 15:23, xxxxxxxx wrote:

    The char array must be Null-terminated, i.e.:

    CHAR w[6] = { 'w', 'h', 'a', 't', '!', 0 };

    The string length doesn't include the null-terminator. You are probably seeing 40 as that is the first byte in memory past the array it encounters as 0 value.



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 24/04/2008 at 05:48, xxxxxxxx wrote:

    The problem Josh has, is because of the pointers/references.

    At this point: singleChar = String(pc, St8bit); he converts data to a string, without knowing that it is just a CHAR!

    So using 'c', and not its pointer 'pc': singleChar=String(c, St8bit); will do.

    Here is a little example:

    > <code>
    >      CHAR test=100;
    >      String str, strRef;
    >      str.Insert(0, test);
    >      strRef.Insert(0, &test;);
    >      GePrint(str);
    >      GePrint(strRef);
    > </code>

    You will see strRef has the right charakter in the first place, but it is followed by converted random memory.

    The same thing should happen in Roberts code, wonder why nobody noticed, but it is a good example of how dangerous pointers are ;-).



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 24/04/2008 at 06:32, xxxxxxxx wrote:

    Just noticed, using singleChar=String(c, St8bit) obviously won't work, but I think you get the idea (yes I know this thread is old ;-) ...future reference etc)


Log in to reply