Reading hex string from a stream

Last Edited By Krjb Donovan
Last Updated: Mar 12, 2014 02:46 PM GMT



Could you please point me to a way to read a hex string (from a stream) that was saved by the following code:

ofstream ww("test.txt"); Byte bmsg[40]; // bmsg[0..15] gets binary values here, for (int i=0; i<16; i++) { // Add leading 0s for the broken STL hex support. if (0==bmsg[i]) ww << "0"; if (16>bmsg[i]) ww << "0"; ww << std::hex << 0+bmsg[i]; }

A request: the "read" method should be in standard (STL) C++



Well as STL implies the Standard Template Library portion of the C++ standard library that includes containers, iterators, algorithms and the like and not things like IOStreams and Locales I think that you are out of luck.

However if you really meant a hosted implementation of ISO standard C++ that includes the full C++ standard library then I think you will be in more luck.

As an aside, note that there are at least two versions of ISO Standard C++: the original 1998 version and the primarily bug-fix 2003 update. There is also the TR1 non-normative additions to the C++ standard library (non-normative means an implementation does not need to include such features to comply with the standard).

An observation on the code presented indicates that you do not require the test for a value being zero as zero is also less than 16!

Also are you sure your IOStream library implementation is 'broken' - I think it is not and your code is 'broken' in that it should ask the stream to format the output as you require - a width of 2 with leading zeros:

   ww << std::hex << std::setw(2) << std::setfill('0') << 0+bmsg[i];

You should include <iomanip> to use std::setw and std::setfill etc.

You can also convert the Byte (a type alias for unsigned char I presume) to an int like so:

   ww << std::hex << std::setw(2) << std::setfill('0') << int(bmsg[i]);

You can also set the field width, fill and base directly on the stream:

   std::ofstream ww("test.txt");
   ww.setf(std::ios::hex, std::ios::basefield);
   Byte bmsg[40];
   // bmsg[0..15] gets binary values here,
   for (int i=0; i<16; i++)
       ww.width(2);        // *must* set each time as reset after formatted I/O operations
       ww << int(bmsg[i]);

Notice that the width is reset to its default value of 0 after each formatted input or output operations (i.e. via operator>> or operator<<), however other values remain in whatever state they were last set to so we only need to set them once.

Now reading the values back in again requires that we read them initially as strings of length 2. This is because for formatted input width is _only_ used when reading strings.

   size_t const          MessageSize(40);
   size_t const          HexFieldsBeginIndex(0);
   size_t const          HexFieldsEndIndex(16); // STL style 1 past the end
   std::streamsize const HexFieldWidth(2);
   char const            DataFileName[] = "test.txt";
  /// ...
   std::ifstream rr(DataFileName);
   rr.setf(std::ios::hex, std::ios::basefield);
   Byte rmsg[MessageSize];
   if ( rr.is_open() )
     std::cout << "Read strings: ";
     for ( int i=HexFieldsBeginIndex; rr && i != HexFieldsEndIndex; ++i )
         std::string strHexValue;
         rr >> strHexValue;
         std::cout << strHexValue << ' ';
     std::cout << std::endl;
       std::cerr << "Failed to open data file " << DataFileName << '.' << std::endl;
       return 1; // from example code main

Note that my examples presume they are in an example test program's main function or similar that returns an integer value to the caller.

We could also use the likes of the get or getline input stream member functions with a count value of 2 - although we would have to read the data into a char array and not have the option of using a std::string:

     for ( int i=HexFieldsBeginIndex; rr && i != HexFieldsEndIndex; ++i )
         size_t const HexCStringBufferSize(HexFieldWidth+1);
         char strHexValue[HexCStringBufferSize];
         rr.get(strHexValue, HexCStringBufferSize);
         std::cout << strHexValue << ' ';

Notice that I have gone to the bother of defining names for most of the magic values such as 40, 16, 2 test.txt. I am also using a message buffer called rmsg which is defined ready to be filled in but not used yet.

Once we have the data in string form - either a std::string or a C-style string we have to convert this value to an integer value. There are two obvious ways to do this:

   - use a std::istringstream using the read std::string data as an input source 
      and extract the input into an integer
   - use the underlying locale num_get facet support for numeric conversions.

The third option is to do the conversion 'by hand' and write code to do it explicitly.

The first option would look like so:

   for ( int i=HexFieldsBeginIndex; rr && i != HexFieldsEndIndex; ++i )
       std::string strHexValue;
       rr >> strHexValue;
       std::istringstream ss(strHexValue);
       ss.setf(std::ios::hex, std::ios::basefield);
       int intHexValue(0);
       ss >> intHexValue;
       rmsg[i] = static_cast<Byte>(intHexValue);

Here I define a std::istringstream (include <sstream>) and initialise it with the previously read strHexValue std::string. I then set the basefield flag on this stream to std::ios::hex for hexadecimal formatted integer character streams and extract the field value into a temporary int to get around the possibility of unsigned char being interpreted as characters rather than integers. I then assign the int to the value of the relevant rmsg Byte field item. The static_cast to Byte is to suppress compiler warnings about loosing data as int is generally a larger type than char types. Note in this version setting the std::ifstream std::ios:hex flag is redundant.

If you were using a C string zero terminated char array buffer rather than a std::string then you would have to use the depreciated std::istrstream char * C-string stream (include <strstream>). Or you could convert the buffer to a std::string first (e.g. create a std::string initialised using the C string buffer).

The second option would look as follows:

   for ( int i=HexFieldsBeginIndex; rr && i != HexFieldsEndIndex; ++i )
       std::string strHexValue;
       rr >> strHexValue;
       std::num_get<char, std::string::iterator> const &
         numGetFacet(  std::use_facet
                         < std::num_get< char
                                       , std::string::iterator
                         >( std::locale() )
       unsigned short ushortHexValue(0);
       std::ios_base::iostate err(0);
       numGetFacet.get(strHexValue.begin(), strHexValue.end(), rr, err, ushortHexValue );
       if ( (err&std::ios_base::failbit) || (err&std::ios_base::badbit) )
         std::cerr << "Bad data in data file " << DataFileName << '.' << std::endl;
         return 2; // from example code main
       rmsg[i] = static_cast<Byte>(ushortHexValue);

First I obtain a std::num_get facet from the current locale - this had better match the locale in force at the time the data was written - if not then use a specific locale matching that used to write the data. Note that unless the code actually makes an effort to set locales and facets then the default locale will always be the C locale, with the default C locale facets.

As you can see I need to ask for a std::num_get facet (include <locale>) that matches both the character type in use and the type of the iterators we are going to use to provide the sequence of characters to convert to a numeric value. If you were using a C string buffer rather than a std::string then you could specify char * as the iterator type.

We have to call a get member function on the (reference to) this facet. I chose the overload taking an unsigned short - as there is such an overload of get for this type but not one for int (it is handled by that for long).

As you can see we pass in a sequence of characters specified by start and end iterator, a stream from which to take formatting flags (which is why I set the std::ios::hex flag on the rr stream), a reference to an object to pass back IOStream-style error flags and a reference to an object to accept the converted value.

If you were using a C string buffer the call to the get member function would look something like:

     numGetFacet.get(strHexValue, strHexValue+HexCStringBufferSize, rr, err, ushortHexValue );

After returning from the num_get get call I check to see if any errors occurred - specifically whether the fail bit or bad bit are set - and return if either are set.

If all is OK I set the relevant Byte in rmsg similarly to that for the std::istringstream case.

For more information on the C++ standard library I suggest you obtain _and use_ a good reference such as "The C++ Standard Library A Tutorial and Reference" by Nicolai M. Josuttis or - specifically for IOStreams and Locales - "Standard C++ IOStreams and Locales" by Langer and Kreft.

Hope this is of use.


©2021 eLuminary LLC. All rights reserved.