ACG Machine Learning Project

17 November 2020 by Bailey Jones

Good morning (or whenever you viewing this)! I am writing here today to describe my experience with the A Cloud Guru Machine Learning #CGC. Let me first off start by saying although the project is completed, the results are not quite what I was expecting so I am going to work on tuning my ML skills to better understand and manipulate my results. My background with Python is limited, so this project presented an immediate learning curve from the jump. I downloaded the IMDb datasets on my computer and then worked on cleaning them up within VSCode, the primary reason being that it's easier to break something locally than in the cloud! Stack Overflow was a huge help for this portion of the project... as I mentioned in a previous blog post, my weakness is that once I start something I typically end up staying up until 4 or 5 in the AM until it's done... so I actually had this done within the first couple of days.

Next, I took a break from the project after exhausting myself with an influx of foreign python code and began building this website. I won't go into detail about the site as I have already in this blog post, but it took about 10 days and then I was able to return to the CGC project. This project actually served as a deadline for completing my website - as I took the "create a blog post" step quite literally and had to have a place to write about my experience!

Finally, it was time to upload my code to a Jupyter Notebook hosted on Amazon Sagemaker. My initial plan was to upload the code I built locally and break it down into steps within the Jupyter Notebook but it was only spitting out errors from the jump. I decided to go with a scikit-learn algorithm and built random forest trees for recommendations. As I mentioned in the beginning of this post, the classifier does not recommend movies how I was envisioning in my head, but that is where my lack of experience comes in. If you have viewed my Jupyter Notebook already, you have probably already figured out the clustering and recommending approach I was attempting to go with. With that being said, any constructive criticism would be more than helpful! I will include helpful links as well as a link to my entire project below. Thank you all for reading!

Links:

- ACG hands-on lab Introduction to Jupyter Notebooks (AWS Sagemaker)
- LA hands-on lab scikit-learn Random Forest Classifier (AWS SageMaker)
- My project GitHub Repository