1- Give students the chance to practice a programming language that will be needed in the course
2- Handle and understand semi-structured data.
3- Extract the required information while it is not possible to use SQL queries, or database techniques.
4- Find patterns of data.
5- Find significant entities that have special characteristics.
6- Give students the chance to perform some data analytics steps.
1- You will be given a file named hobbies.txt.
a. This file contains a group of fictitious Facebook users and their hobbies.
b. Each line in the file contains a user/username and a list of hobbies of that user.
c. The data in each line is delimited by commas.
d. For instance in the line: 2254,reading,coding,swimming,playing soccer,
i. The user/username is: 2254
ii. The hobbies are: reading, coding, swimming, and playing soccer
iii. The number and type of hobbies may differ from one user to another.
2- This file will be your data set that your code has to read to be able to implement a code that does the following:
a. Finding circles/networks of friends:
i. In each circle you will report, all the users should share at least x number of hobbies
ii. x is a variable that a user can input to the program.
iii. Circles of friends should be written to a file named circles.txt.
iv. Each line should have the usernames in the circle/network you found, tab character, and list of shared hobbies.
v. for example, a line may look like: 2254,552,1258 reading,swimming,hiking
b. Finding popular users:
i. Popularity is based on being part of at least y circles/networks.
ii. y can be variable that a user can input to the program.
iii. Popular users should be written to a file named popular.txt. Each user and how many circles/networks the user belongs to, should be in separate line and separated by the tab character.
iv. For instance: 2254 5
v. This step should occur after step (a.).
vi. Hint: You may want to save the circles you found in part (a.) in some data structure so that you can us them in this part.
2- You should be developing this project under the Linux machine (the Cloudera virtual machine) you should have installed at the beginning of this semester, without the need to install any special packages or libraries except the default compilers and libraries.
3- Name the solution file [url removed, login to view], [url removed, login to view], or facebook.scala.
4- Only one code file should be submitted per group. Your code should start with a block of comment.
5- This comment block has:
a. Students names, ids, and sections
6- You have to make sure that your code runs error-free, especially compilation errors.
a. We will not debug or fix any errors. Very low score is expected in this case.
7- Be careful about the Path names/information.
a. Always assume current folder/directory.
8- The command to run your code would be similar to: python2.6 [url removed, login to view] 5 6
a. 5 refers to the x in step a., and 6 refers to y in step b.