Here’s another problem dealing with roman numbers. The problem is basically, given a string, find the largest roman number you can extract from it. One important thing to note in this problem is that unlike “normal” roman numbers, where “M” can only appear three(3) times max, the “M” here can be as many as possible.

The first step would be filtering out the unnecessary non-roman characters. If we’re planning to scan the string as lot of times, we might as well remove the non-roman characters first. Else, if we’re just going to scan it once, we might as well just scan it and not bother removing the non-roman characters first. Nevertheless this is just a trivial decision so the samples following this will only contain roman numbers for clarity.

Since the problem is asking for the largest roman number (in decimal) in a string, we should follow the “greedy” process of getting the numbers. We scan the string from left to right, looking for roman characters in descending order (From M to I) without backtracking (since we read from left to right at least for English right?). So, for example, if we have CMMV, we’ll scan for M first, getting the first two M’s, and ignoring the C (though CM is valid, it’s not the most you can get out of it). Then we should get the value next to M which is C, but since the only C is before the M’s (remember, we can’t go back), we’ll have to move on and get V. So all in all, we end up with MMV = 2005 (as opposed if you get CM, you’ll have to skip the second M and end up with CMV = 905 which is less).

Follow up:

That was easy, and now for more complicated rules. First of all, there are character limits. M has no problem since the problem states that there can be any number of M’s. The “fives”, namely D, L, and V can only appear at most once. The “ones”, C, X and I can appear at most three times with the exception of forming “nines” and “fours” ( IX, IV, CM, CD, etc.). The formation of “fours” could be ignored since we’re better off having the “five” instead of the “four” (e.x. for XXXLXX, we are better off with LXX = 70 over XL = 40). However, the “nines” are a different matter since the “ones” can appear more than once (e.x. XXXIXV, after grabbing the 3 X’s, we end up with IXV, at this point we could either go for IV = 4 V = 5 or IX = 9). So if we can grab a “nine” from the remaining string, we skip all values in the same place value and move to the next place value. CM could be ignored, however (just get the M and ignore the C), since M can appear any number of times. We actually are not just scanning just for M > D > C > L > X > V > I, but is scanning for M > D > C > ( XC | (L > X) ) > (IX | (V > I) ). To put it in English:

  1. Grab all M
  2. Grab the first D if it exists
  3. Grab up to three C’s
  4. Grab XC (it is implied that this is impossible if all of the C’s are consumed by the previous step)
  5. If we can’t find it (XC), go grab an L and up to three X’s
  6. Grab IX
  7. If we can’t find it (IX), go grab a V and up to three I’s

Here are more samples:

  • XXXXX => We grab 3 X’s and drop the rest
  • MXMIMVMCM => We just grab all the M’s MMMMM = 5000
  • IXIXXIXVXI => We grab 3 X’s leaving us with IXVXI, we just grab IX and drop the rest.
  • CMDCXCL => Grab the M, grab the D, grab 2 C’s and finally the L

And now for the implementation:
As you may notice, the scan involves a lot of jumping/searching for characters and backtracking(i.e. to get all M’s we need to get the last M; to get XC, we must look for X, then look for C after X, if we can’t find it, we fall back to looking for D and then C ). If we just scan the string as it is, even if we filter out the non-roman characters. Our algorithm will consume a lot of time just to look for a certain character (e.x. MIIIIIIIIIIIIIIIIC) as an extra note, the maximum number of characters in a line is 10000 characters. One technique to overcome this is by indexing all the positions of the character in a table. As in, we have a map of arrays which map to all the positions of that character in order so we can access it with something like charMap[’x'][0], instead of having a for loop everytime. We start off by scanning the code once, indexing the positions of the roman characters. In my case, I have an array which maps the roman characters to the actual array instead of using switch..case, map or hashmap.
To find the character I’m looking for, I would just have to look at the character map instead of searching the array over and over again. This would save us a lot of time since we are only going through the positions of the character we’re looking for and save us some comparisons.

Here’s the final code:

C++:

#include <iostream>
#include <vector>
#include <map>
#include <string>
#include <algorithm>

using namespace std;
//#define MYDBG
#ifdef MYDBG
#include <fstream>
ifstream fin("input.txt");
#define cin fin
#endif

char romanChars[‘z’+1] = { 0 } ;

enum romanToIndex { I = 0, V, X, L, C, D, M };

//Returns: The index of the toFind array which is > offset
int findAfter( int offset, const vector<int> &toFind ) {
        int ret = -1;
        for ( unsigned int i = 0; i < toFind.size(); ++i ) {
                if ( toFind[i] >= offset ) {
                        ret = i;
                        break;
                }
        }
        return ret;
}

int main() {
        romanChars[‘i’] = I + 1; romanChars[‘v’] = V + 1; romanChars[‘x’] = X + 1; romanChars[‘l’] = L + 1;
        romanChars[‘c’] = C + 1; romanChars[‘d’] = D + 1; romanChars[‘m’] = M + 1;

        string buffer;
        buffer.reserve( 10000 );
        do {
                vector<int> characterPositions[7];
                if ( !getline( cin, buffer ) ) break;

                for ( unsigned int i = 0; i < buffer.size(); ++i ) {
                        if ( romanChars[ buffer[i] ] )
                                characterPositions[romanChars[ buffer[i] ] - 1].push_back( i );
                }

                int currentPos;
                unsigned int maxVal;
                //chomp all M’s
                if ( characterPositions[M].size() ) {
                        currentPos = characterPositions[M].back() + 1;
                        maxVal = characterPositions[M].size() * 1000;
                } else
                        currentPos = maxVal = 0;

                //find the first ‘d’ after the last ‘m’
                int index = findAfter( currentPos, characterPositions[D] );
                if ( index != -1 ) {
                        maxVal += 500;
                        currentPos = characterPositions[D][index] + 1;
                }
               
                //chomp up to 3 c’s
                for ( unsigned int i = 0; i < 3; i++ ) {
                        index = findAfter( currentPos, characterPositions[C] );
                        if ( index != -1 ) {
                                maxVal += 100;
                                currentPos = characterPositions[C][index] + 1;
                        } else
                                break;
                }

                //chomp the first X, then see if we can get "XC" for a 90 or not
                index = findAfter( currentPos, characterPositions[X] );
                if ( index != -1 ) {
                        //try to find a C i.e. XC to get 90
                        index = findAfter( characterPositions[X][index], characterPositions[C] );
                        if ( index != -1 ) {
                                maxVal += 90 ;
                                currentPos = characterPositions[C][index] + 1;
                               
                        } else {

                                //we fail, try getting L, then try to chomp 3 X’s
                                index = findAfter( currentPos, characterPositions[L] );
                                if ( index != -1 ) {
                                        maxVal += 50;
                                        currentPos = characterPositions[L][index] + 1;
                                }

                                for( unsigned int i = 0; i < 3; ++i ) {
                                        index = findAfter( currentPos, characterPositions[X] );
                                        if ( index != -1 ) {
                                                maxVal += 10;
                                                currentPos = characterPositions[X][index] + 1;
                                        } else
                                                break;
                                }
                        }
                } else {
                        //no "X" found, let’s try our luck getting "L"
                        index = findAfter( currentPos, characterPositions[L] );
                        if ( index != -1 ) {
                                maxVal += 50;
                                currentPos = characterPositions[L][index] + 1;
                        }
                }

                //same case, but replace X with I, L with V and C with X
                index = findAfter( currentPos, characterPositions[I] );
                if ( index != -1 ) {
                        //try to find a I i.e. IX to get 9
                        index = findAfter( characterPositions[I][index], characterPositions[X] );
                        if ( index != -1 ) {
                                maxVal += 9 ;
                                currentPos = characterPositions[X][index] + 1;
                               
                        } else {

                                //we fail, try getting V, then try to chomp 3 I’s
                                index = findAfter( currentPos, characterPositions[V] );
                                if ( index != -1 ) {
                                        maxVal += 5;
                                        currentPos = characterPositions[V][index] + 1;
                                }

                                for( unsigned int i = 0; i < 3; ++i ) {
                                        index = findAfter( currentPos, characterPositions[I] );
                                        if ( index != -1 ) {
                                                maxVal += 1;
                                                currentPos = characterPositions[I][index] + 1;
                                        } else
                                                break;
                                }
                        }
                } else {
                        //no "X" found, let’s try our luck getting "L"
                        index = findAfter( currentPos, characterPositions[V] );
                        if ( index != -1 ) {
                                maxVal += 5;
                                currentPos = characterPositions[V][index];
                        }
                }
                cout << maxVal << endl;
        } while ( !cin.eof() );
#ifdef MYDBG
        system("pause");
#endif
        return 0;
}

An astute reader may note that I could improve the performance of findAfter() by allowing provisions to start where it last left off by introducing a parameter. So it could look like this:

C++:

int findAfter( int offset, const vector<int> &toFind, unsigned int start ) {
        int ret = -1;
        for ( unsigned int i = start; i < toFind.size(); ++i ) {
                if ( toFind[i] >= offset ) {
                        ret = i;
                        break;
                }
        }
        return ret;
}

And just pass the index where it left off accordingly. However, even without this optimization, the code managed to run under time limit. And further more, this may introduce additional complications in the code.

Trackback address for this post

Trackback URL (right click and copy shortcut/link location)

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)