Here’s another problem dealing with roman numbers. The problem is basically, given a string, find the largest roman number you can extract from it. One important thing to note in this problem is that unlike “normal” roman numbers, where “M” can only appear three(3) times max, the “M” here can be as many as possible.
The first step would be filtering out the unnecessary non-roman characters. If we’re planning to scan the string as lot of times, we might as well remove the non-roman characters first. Else, if we’re just going to scan it once, we might as well just scan it and not bother removing the non-roman characters first. Nevertheless this is just a trivial decision so the samples following this will only contain roman numbers for clarity.
Since the problem is asking for the largest roman number (in decimal) in a string, we should follow the “greedy” process of getting the numbers. We scan the string from left to right, looking for roman characters in descending order (From M to I) without backtracking (since we read from left to right at least for English right?). So, for example, if we have CMMV, we’ll scan for M first, getting the first two M’s, and ignoring the C (though CM is valid, it’s not the most you can get out of it). Then we should get the value next to M which is C, but since the only C is before the M’s (remember, we can’t go back), we’ll have to move on and get V. So all in all, we end up with MMV = 2005 (as opposed if you get CM, you’ll have to skip the second M and end up with CMV = 905 which is less).
Follow up:
That was easy, and now for more complicated rules. First of all, there are character limits. M has no problem since the problem states that there can be any number of M’s. The “fives”, namely D, L, and V can only appear at most once. The “ones”, C, X and I can appear at most three times with the exception of forming “nines” and “fours” ( IX, IV, CM, CD, etc.). The formation of “fours” could be ignored since we’re better off having the “five” instead of the “four” (e.x. for XXXLXX, we are better off with LXX = 70 over XL = 40). However, the “nines” are a different matter since the “ones” can appear more than once (e.x. XXXIXV, after grabbing the 3 X’s, we end up with IXV, at this point we could either go for IV = 4 V = 5 or IX = 9). So if we can grab a “nine” from the remaining string, we skip all values in the same place value and move to the next place value. CM could be ignored, however (just get the M and ignore the C), since M can appear any number of times. We actually are not just scanning just for M > D > C > L > X > V > I, but is scanning for M > D > C > ( XC | (L > X) ) > (IX | (V > I) ). To put it in English:
Here are more samples:
And now for the implementation:
As you may notice, the scan involves a lot of jumping/searching for characters and backtracking(i.e. to get all M’s we need to get the last M; to get XC, we must look for X, then look for C after X, if we can’t find it, we fall back to looking for D and then C ). If we just scan the string as it is, even if we filter out the non-roman characters. Our algorithm will consume a lot of time just to look for a certain character (e.x. MIIIIIIIIIIIIIIIIC) as an extra note, the maximum number of characters in a line is 10000 characters. One technique to overcome this is by indexing all the positions of the character in a table. As in, we have a map of arrays which map to all the positions of that character in order so we can access it with something like charMap[’x'][0], instead of having a for loop everytime. We start off by scanning the code once, indexing the positions of the roman characters. In my case, I have an array which maps the roman characters to the actual array instead of using switch..case, map or hashmap.
To find the character I’m looking for, I would just have to look at the character map instead of searching the array over and over again. This would save us a lot of time since we are only going through the positions of the character we’re looking for and save us some comparisons.
Here’s the final code:
C++:
| #include <iostream> | |
| #include <vector> | |
| #include <map> | |
| #include <string> | |
| #include <algorithm> | |
| using namespace std; | |
| //#define MYDBG | |
| #ifdef MYDBG | |
| #include <fstream> | |
| ifstream fin("input.txt"); | |
| #define cin fin | |
| #endif | |
| char romanChars[‘z’+1] = { 0 } ; | |
| enum romanToIndex { I = 0, V, X, L, C, D, M }; | |
| //Returns: The index of the toFind array which is > offset | |
| int findAfter( int offset, const vector<int> &toFind ) { | |
| int ret = -1; | |
| for ( unsigned int i = 0; i < toFind.size(); ++i ) { | |
| if ( toFind[i] >= offset ) { | |
| ret = i; | |
| break; | |
| } | |
| } | |
| return ret; | |
| } | |
| int main() { | |
| romanChars[‘i’] = I + 1; romanChars[‘v’] = V + 1; romanChars[‘x’] = X + 1; romanChars[‘l’] = L + 1; | |
| romanChars[‘c’] = C + 1; romanChars[‘d’] = D + 1; romanChars[‘m’] = M + 1; | |
| string buffer; | |
| buffer.reserve( 10000 ); | |
| do { | |
| vector<int> characterPositions[7]; | |
| if ( !getline( cin, buffer ) ) break; | |
| for ( unsigned int i = 0; i < buffer.size(); ++i ) { | |
| if ( romanChars[ buffer[i] ] ) | |
| characterPositions[romanChars[ buffer[i] ] - 1].push_back( i ); | |
| } | |
| int currentPos; | |
| unsigned int maxVal; | |
| //chomp all M’s | |
| if ( characterPositions[M].size() ) { | |
| currentPos = characterPositions[M].back() + 1; | |
| maxVal = characterPositions[M].size() * 1000; | |
| } else | |
| currentPos = maxVal = 0; | |
| //find the first ‘d’ after the last ‘m’ | |
| int index = findAfter( currentPos, characterPositions[D] ); | |
| if ( index != -1 ) { | |
| maxVal += 500; | |
| currentPos = characterPositions[D][index] + 1; | |
| } | |
| | |
| //chomp up to 3 c’s | |
| for ( unsigned int i = 0; i < 3; i++ ) { | |
| index = findAfter( currentPos, characterPositions[C] ); | |
| if ( index != -1 ) { | |
| maxVal += 100; | |
| currentPos = characterPositions[C][index] + 1; | |
| } else | |
| break; | |
| } | |
| //chomp the first X, then see if we can get "XC" for a 90 or not | |
| index = findAfter( currentPos, characterPositions[X] ); | |
| if ( index != -1 ) { | |
| //try to find a C i.e. XC to get 90 | |
| index = findAfter( characterPositions[X][index], characterPositions[C] ); | |
| if ( index != -1 ) { | |
| maxVal += 90 ; | |
| currentPos = characterPositions[C][index] + 1; | |
| | |
| } else { | |
| //we fail, try getting L, then try to chomp 3 X’s | |
| index = findAfter( currentPos, characterPositions[L] ); | |
| if ( index != -1 ) { | |
| maxVal += 50; | |
| currentPos = characterPositions[L][index] + 1; | |
| } | |
| for( unsigned int i = 0; i < 3; ++i ) { | |
| index = findAfter( currentPos, characterPositions[X] ); | |
| if ( index != -1 ) { | |
| maxVal += 10; | |
| currentPos = characterPositions[X][index] + 1; | |
| } else | |
| break; | |
| } | |
| } | |
| } else { | |
| //no "X" found, let’s try our luck getting "L" | |
| index = findAfter( currentPos, characterPositions[L] ); | |
| if ( index != -1 ) { | |
| maxVal += 50; | |
| currentPos = characterPositions[L][index] + 1; | |
| } | |
| } | |
| //same case, but replace X with I, L with V and C with X | |
| index = findAfter( currentPos, characterPositions[I] ); | |
| if ( index != -1 ) { | |
| //try to find a I i.e. IX to get 9 | |
| index = findAfter( characterPositions[I][index], characterPositions[X] ); | |
| if ( index != -1 ) { | |
| maxVal += 9 ; | |
| currentPos = characterPositions[X][index] + 1; | |
| | |
| } else { | |
| //we fail, try getting V, then try to chomp 3 I’s | |
| index = findAfter( currentPos, characterPositions[V] ); | |
| if ( index != -1 ) { | |
| maxVal += 5; | |
| currentPos = characterPositions[V][index] + 1; | |
| } | |
| for( unsigned int i = 0; i < 3; ++i ) { | |
| index = findAfter( currentPos, characterPositions[I] ); | |
| if ( index != -1 ) { | |
| maxVal += 1; | |
| currentPos = characterPositions[I][index] + 1; | |
| } else | |
| break; | |
| } | |
| } | |
| } else { | |
| //no "X" found, let’s try our luck getting "L" | |
| index = findAfter( currentPos, characterPositions[V] ); | |
| if ( index != -1 ) { | |
| maxVal += 5; | |
| currentPos = characterPositions[V][index]; | |
| } | |
| } | |
| cout << maxVal << endl; | |
| } while ( !cin.eof() ); | |
| #ifdef MYDBG | |
| system("pause"); | |
| #endif | |
| return 0; | |
| } |
An astute reader may note that I could improve the performance of findAfter() by allowing provisions to start where it last left off by introducing a parameter. So it could look like this:
C++:
| int findAfter( int offset, const vector<int> &toFind, unsigned int start ) { | |
| int ret = -1; | |
| for ( unsigned int i = start; i < toFind.size(); ++i ) { | |
| if ( toFind[i] >= offset ) { | |
| ret = i; | |
| break; | |
| } | |
| } | |
| return ret; | |
| } |
And just pass the index where it left off accordingly. However, even without this optimization, the code managed to run under time limit. And further more, this may introduce additional complications in the code.
Trackback URL (right click and copy shortcut/link location)