Location : Noida (UP) India
Education : Graduate or above in computer science / engineering from a recognized institute
A string is comprised of words defined as continuous runs of alphanumeric characters separated by separators (spaces, commas, periods, semi colons, exclamation marks, any other punctuation symbol except apostrophes').
So a string might look like this :
The hungry scanner keeps a suspicious watch on doctors and their unsuspecting patients
The scanner counts words, collecting those together where a common substring of length 4 or greater occurs.
For example, in the given sentence, suspicious and unsuspecting have a common substring of length 4 "susp". Thus the scanner would output something like this :
The : 1 hungry: 1 scanner: 1 keeps: 1 a: 1 suspicious, unsuspecting: 2 watch: 1 on: 1 doctors: 1 and: 1 their: 1 patients: 1
Write a function that can work on a very long string stored in a file, and output a table as above. Each set of common-substring words should be listed on a line, along with a count of their joint occurrence. Please try to NOT load the entire file in memory to conserve system resources!
Use the following text (in a file) as an input to test your program:
Down, down, down. There was nothing else to do, so Alice soon began talking again. “Dinah’ll miss me very much to-night, I should think!” (Dinah was the cat.) “I hope they’ll remember her saucer of milk at tea-time. Dinah my dear! I wish you were down here with me! There are no mice in the air, I’m afraid, but you might catch a bat, and that’s very like a mouse, you know. But do cats eat bats, I wonder?” And here Alice began to get rather sleepy, and went on saying to herself, in a dreamy sort of way, “Do cats eat bats? Do cats eat bats?” and sometimes, “Do bats eat cats?” for, you see, as she couldn’t answer either question, it didn’t much matter which way she put it. She felt that she was dozing off, and had just begun to dream that she was walking hand in hand with Dinah, and saying to her very earnestly, “Now, Dinah, tell me the truth: did you ever eat a bat?” when suddenly, thump! thump! down she came upon a heap of sticks and dry leaves, and the fall was over.
Common root words.
As you may suspect, the 4 length common substring heuristic is just a simple approximate technique to find words with common roots. It is liable to fail in many situations. One failure situation is unrelated words with common substrings. For example, consider the following words:
Ionization, Ionic, Actualization, Actual
Clearly the first two words have a common root, as do the last two. Unfortunately, the simple approach also identifies Ionization and Actualization as having a common root tue to the presence of the string "tion". We have a situation where Actualization could be counted in two slots.
Find an approach to "break ties" in these cases. What logic can you apply to declare that [Actual, Actualization] is a better match than [Actualization, Ionization] ?
Describe and implement your logic as a separate subroutine called from the function implemented in Q1.
To submit your work send your running code in text file and mail to firstname.lastname@example.org with subject "SDE Assignment - Feb 2021".
Candidates will be considered in the order of assignment submission. Please submit your assignment as soon as possible.