How to Build a Data Science Portfolio

One of the most important steps to becoming a Data Scientist is not only learning but also planning how you will showcase your skills, accomplishments, and knowledge.

How to Build a Data Science Portfolio
Photo by ThisisEngineering RAEng / Unsplash

A portfolio is an important means for building and establishing connections for data scientists. And it is always helpful to examine the skill sets that you have mastered (or you are learning), and based on those skills, build a portfolio for the contributions to internship and job opportunities.

"If you are an aspiring data scientist just starting your career, work on personal projects in addition to whatever you do to learn more," says Dr. Ingo Mierswa, an industry-veteran data scientist and founder of Altair RapidMiner, a no-code data science platform. "And you should choose projects related to very personal problems or topics you're passionate about."

For instance, if you're into some form of sports, then find data about the sports you love—whether it's NFL data or baseball data—or pick whatever data you're interested in and try to predict. "Analyze how the team performs over the season; take something you're passionate about, and you will likely find some data about that," Mierswa suggests. "Being passionate about this keeps you going. And you also need to be a bit of an expert, a domain expert for that."

"There is nothing else you'll learn more from than from these types of projects," Mierswa remarked during our call, "and I'm sometimes shocked that people are coming fresh from college, never worked on any project, then start working, and the only projects they're working on are what they're paid for. But there's typically not much passion and also less learning progress. So, for aspiring data scientists, I always encourage them to pick personal passion projects in addition. And then, if you make a little bit more progress in your career right now..."

You Need To Showcase Your Skills

The requirements in data science job postings make life challenging for every job applicant to stand out. So how are you going to get your no-data-science experience foot in the door?

Your public portfolio, because it is the skeleton of your skill, will make your resume strong and accelerate your job hunting efforts. And help you confidently apply for a wide variety of entry-level Data Science jobs.

"For the initial resume screening by HR, it's imperative to highlight projects completed in academic settings or also voluntary work on platforms like Kaggle," says Varun Mandalapu, a Senior Data Scientist in the Insurtech domain. "The reason being, even for internships, companies seek hands-on experience and specifics about the types of modeling involved. But degrees or other accomplishments outside data science rarely convey this."

"If you're making a one-page resume, using at least half a page to showcase the projects you worked on while learning in this field can be super effective," he advises. "But keep it simple and don't overload it with too many or overly complex projects, or it might otherwise dilute the impact you're aiming for."

Working on data science projects when you have learned the basic technical skills can be exhilarating. You'll naturally feel more inclined to sharpen your analytical skills.

Work On Projects and Create Reusable Solutions

Working through a data science project will cement your knowledge and bootstrap your confidence to talk about it. Mandalapu explains, "Having projects on your portfolio can be a powerful way to open doors to internship opportunities because people who find your projects interesting will also look through your portfolio in order to gauge your skills, experience, and interests, and may even recommend you for contributions."

And this can and will help you to connect with other data science and ML enthusiasts on platforms like Kaggle, Reddit, and Stack Overflow to find valuable advice and mentorship. "Building a portfolio is all about opening doors," he recommends. "Showcase your best project work and wait patiently for those opportunities to come knocking."

When you're learning new data science skills, go that extra mile to contribute and showcase your work. This includes working on projects that employ public datasets and sharing your progress consistently. And making your projects reusable through platforms like GitHub and writing articles explaining your findings are particularly effective strategies.

Data scientists are naturally curious about each other's work, eager to learn what algorithms and methods were used to achieve specific results. By sharing your learnings and insights, you'll not only stand out from the crowd but also contribute to the data science community's collective knowledge base.

—Creating Projects

As a beginner, you can start with easy projects and observe how your peers create well-documented projects and communicate the quality of analyses.

Projects are not substitutes for your work experience, but if you dedicate time to improving your skills, you can show the expertise that most people gain through work experience. As you learn data science through projects, cultivate the habit of documenting your work on platforms like GitHub and Deepnote.

—Document Everything

Portfolio projects that capture the most attention are those that are well-documented. Documentation can make or break the success of your projects and your portfolio overall.

Code quality is of paramount importance for relevance and clarity. If your work is not simple, it is not exceptional.

Here is an example of elegant Python code.

Writing Clean code to showcase on a Data Science Portfolio
Python Code

This code snippet is human-readable, with comments explaining the use of each function and variable in fewer words.

A good portfolio project showcases both your technical and soft skills. Expanding through writing and showcasing contributions will enhance your chances of getting noticed by potential employers, as the intended use of your portfolio is to provide a quick tour of your skills.

If you've spent hours scraping a public dataset for a specific task, you could also create a project repository to make your scraping tool accessible and document the entire process by writing an article about it that demonstrates your technical skillset.

Here's an example of a great portfolio:

All the relevant information is on the homepage. "I am Chris Tran, a Machine Learning Engineer in Deep Learning, NLP, and Computer Vision. What else is there to know?" It is punchy and direct to the point. Chris has an educational background in Statistical programming and Machine learning.

The main thing to take from Chris Tran's approach is simplicity and organization. The portfolio section clearly shows that Chris has put careful thought into showcasing his skills by writing in-depth tutorials explaining every important detail for each project.

He is driving visitors from his project repositories. He creates a clear and intuitive README file for each repository with links to a topic-specific article for learning the concepts involved in building the project. This is a brilliant approach to maintain a healthy portfolio.

It's worth noting from this brief clip how Chris gets into detailed case studies on his website, where we also get to learn about his personality and communication skills.

Tip: Learn project documentation from READMEs guide by GitHub.

—Publishing

Again, the most important aspect of deployment is code quality. Learn the best practices to write your programs more effectively. This will help you learn what to include and what to avoid, how to strike the right balance, and why it's the best choice.

You might benefit from the book Fluent Python to learn specific ways to write better Python code. It's highly recommended in the developer community.

Your work will not go unnoticed. Honing your coding skills and learning from others will help you become a better researcher.

You could configure a local Jupyter environment with GitHub or Deepnote to publish your projects. The single document approach with Jupyter Notebook makes it easy to develop, visualize, and add information and formulas that make work more understandable, repeatable, and shareable.

This is what data scientists are doing. It is a common practice to demonstrate that you have the technical skills and ability to explain complex topics in a way that is understandable.

3 Tips for Building a Strong Presence

With building a professional portfolio, your goal should be to stand out and be one of a kind, not one of many.

These tips will help you persuade potential employers that you are uniquely qualified for a position.

Join Kaggle

Kaggle is the largest and most trusted online community for data scientists and machine learning enthusiasts. You can collaborate with other users, find and publish datasets, use GPU-integrated notebooks, and participate in competitions to solve data science challenges.

Employers pay a lot of attention to your Kaggle profile. A strong profile will surely result in a lot of exposure, which can help you get an entry-level job.

It is great for learning machine learning. It's completely free, including all datasets, participation in competitions, and discussions. You can also connect with recruiters through the Jobs Board.

Datasets

It's a great platform to learn how to think and solve real-world problems. You can generate project ideas from real-world datasets, with over 160k of them to keep your motivation high throughout the learning process.

Competitions

Companies such as Google and American Express host Kaggle competitions. Your performance is a powerful way to stand out from the crowd and show your abilities in solving complex problems.

The competitions usually last for three months, offering anywhere between $10,000-150,000 in prize money. There are only 94 grandmasters in the world, and most of them have been using Kaggle for over two years.

Be open to sharp criticism; Kaggle offers aspiring data scientists the best chance to learn from qualified people for free.

The expertise you gain on Kaggle will be invaluable.

Always use GitHub

GitHub keeps track of your daily contributions. Your work is publicly visible, and people can see your working knowledge and commitment to data science.

You should make the most out of GitHub. Data scientists universally use GitHub because it hosts nearly all data science repositories, powerful libraries/packages, and tons of other programming resources.

One of the best ways to highlight your skills is to have an active presence on GitHub. Having an active GitHub profile could open up tremendous collaboration or internship opportunities that you can also showcase in your portfolio.

You can host both code-based and content-based projects on GitHub.

Project Example 👇🏾👇

It’s clear at a glance where Chris Tran's skills lie: Python, machine learning, and building AI systems.

It's always a good practice to put the code you’ve written up on your GitHub profile regularly. You can create a static website like Chris Tran's with GitHub Pages to host your blog and portfolio for free.

You can easily customize your GitHub profile page, add links to your articles, and showcase your projects. It's best to link your GitHub, LinkedIn, and Kaggle profiles.

It's easy to gain familiarity with Git and GitHub terminology, such as repository, branch, commit, pull request, etc. You can learn from the official guide or the recommended resources below.

Write as you learn

Data science blogs can be a fantastic way to improve your communication skills, present your analysis, uncover unique insights, and publish data visualizations.

While it's true that you can show your expertise through projects, you should also start writing tutorials as you grow. You can build readership if you write high-quality tutorials.

There is no one portfolio format that works best. The common denominator, though, is that you should focus on your specialty, skills, and notable accomplishments.

Your portfolio should have an intriguing description that drives people to check out your projects, tutorials, articles, etc.

Thanks for making it to the end...

If you liked this guide, we have a few practical data science resources for you: