Pages

Monday, January 11, 2010

SQL Challenge

Here is a challenge that takes you away from those repetitive boring type of queries that you write over and over again, several times a day. All of us, the database people, are familiar with thinking in set based manner as well as row by row style. Here is something that is very interesting where you might need to process records in a 'three-line-at-a-time' fashion.

For the purpose of this challenge, imagine that you are working for a bank which just decided to scan all the banking documents. Assume that they have an old fashioned scanner which scans the documents and produces a text file with the customer number. So far so good. Well, not really! Unfortunately the scanner produces a graphical representation of the customer number using three lines of symbols: space, unerscores and pipe characters.

Here is an example of the output produced by the scanner.

  





Here are the rules to keep in mind while reading and recognizing the output generated by the scanner.

  • Each digit is represented using 9 cells (3x3)
  • Only spaces, underscores and pipe characters are used
  • The number of digits in each account number may vary.
  • The Scanner is not 100% reliable and it might produce some digits that are invalid

The Challenge

Your job is to read the output produced by the scanner and identify the the customer number represented by each image. Remember that the scanner is not very reliable and it might produce invalid digit representations. For each digit that is not valid, set the value to 'X'

Sample Data

Here is the sample data for this challenge. Please take care with spaces, tabs and carriage returns as each digit is represented by three lines of text and if a space, tab or carriage return is misplaced, the whole image will be distorted.

Id          ScanNumber
----------- ---------------------------
1            _  _  _  _  _  _  _  _  _  
            | || || || || |  || ||_ |_|  
            |_||_||_||_||_|  ||_| _| _| 
                           
2               _  _  _  _  _  _     _ 
            |_||_|| || ||_   |  |  ||_ 
              | _||_||_||_|  |  |  | _|
                           
3            _  _  _     _  _  _  _  _  
            |_ |_|| || ||_ |_| _|  ||_| 
            |_||_||_||_||_||_||_   | _| 
                           
4               _  _  _  _  _  _     _ 
            |_||_|| ||_||_   |  |  ||_ 
              | _||_||_||_|  |  |  ||_|
                           
5               _  _  _  _  _  _     _ 
            | ||_|| ||_||_   |  |  ||_ 
              | _||_||_||_|  |  |  ||_|
                           
6            _     _  _     _  _  _  _ 
            | |  | _| _||_||_ |_   ||_|
            |_|  ||_  _|  | _||_|  ||_|


Expected Results

Based on the sample input and the rules discussed earlier, here is the expected output.

Id          Value
----------- ---------
1           000007059
2           490067715
3           680X68279
4           490867716
5           X90867716
6           012345678


Sample Scripts

Use the following script to generate the sample data for this challenge.

DECLARE @t TABLE (Id int, ScanNumber NVARCHAR(116))
 
INSERT INTO @t
SELECT  1,--> 000 007 059
'_  _  _  _  _  _  _  _  _ 
| || || || || |  || ||_ |_|
|_||_||_||_||_|  ||_| _| _|
                           
' UNION 
SELECT 2,-->  490 067 715
'   _  _  _  _  _  _     _ 
|_||_|| || ||_   |  |  ||_ 
  | _||_||_||_|  |  |  | _|
                           
' UNION
SELECT  3, --> 680 X68 279
'_  _  _     _  _  _  _  _ 
|_ |_|| || ||_ |_| _|  ||_|
|_||_||_||_||_||_||_   | _|
                           
' UNION
SELECT  4,--> 490 867 716
'   _  _  _  _  _  _     _ 
|_||_|| ||_||_   |  |  ||_ 
  | _||_||_||_|  |  |  ||_|
                           
'  UNION
SELECT  5,--> X90 867 716
'   _  _  _  _  _  _     _ 
| ||_|| ||_||_   |  |  ||_ 
  | _||_||_||_|  |  |  ||_|
                           
' 
UNION 
SELECT 6,--> 012 345 678
'_     _  _     _  _  _  _ 
| |  | _| _||_||_ |_   ||_|
|_|  ||_  _|  | _||_|  ||_|
                         
Notes
  1. Each record may have more than three lines of data (each line is separated by a CR and LF). Your code should consider only the first three lines.
  2. The length of the first three lines of each recrd will always be the same and will be divisible by three.
  3. There may be 3x3 blocks of spaces in the string. In such a case, you should generate an empty space in the output. If a 3x3 block does not create a valid digit (except for the case of a 3x3 block of spaces), you should generate an "X".
  4. The number of 3x3 blocks in each record may vary

0 comments:

Post a Comment