Photo by the blowup on Unsplash

Kickstarter: to succeed or not to succeed?

Claudia
5 min readFeb 19, 2021

--

Kickstarter is a company that allows creative projects to come to life through crowdfunding.

People who become project backers are offered perks as rewards for pledging money towards the project. These perks are incentives to bring in more backers and thus fulfill the monetary goal of the project.

Within these 15 main categories, various sub-categories further identify a project.

From inception, Kickstarter was a notable and fast-paced vehicle for a project to streamline a way to accrue funds.

For this project, I scraped data from webrobots.

Data was gathered and cleaned so that it could be used for visualizations and machine learning models.

The Data

The original columns that came from the scraped data were plentiful.

While this does give us a lot of information on projects from 2009 until 2021, there were things that I had to drop, clean, or change completely for the data to give meaningful information.

Index(['backers_count', 'blurb', 'category', 'converted_pledged_amount', 'country', 'country_displayable_name', 'created_at', 'creator', 'currency', 'currency_symbol', 'currency_trailing_code', 'current_currency', 'deadline', 'disable_communication', 'friends','fx_rate', 'goal', 'id', 'is_backing', 'is_starrable', 'is_starred', 'launched_at', 'location', 'name', 'permissions', 'photo', 'pledged', 'profile', 'slug', 'source_url', 'spotlight', 'staff_pick', 'state', 'state_changed_at', 'static_usd_rate', 'urls', 'usd_pledged', 'usd_type'], dtype='object')

Cleaning data is where most time is spent, to purposefully clean one has to understand the data and the relation keeping it all together.

df.category.value_counts()  Comedy            9723 
Web 8820
Product Design 4000
Spaces 3774
Tabletop Games 3633
...
Quilts 100
Chiptune 56
Games 55
Kids 20
Video Art 7
Name: category, Length: 158

One of the many obstacles I faced when looking at the data was with a column named “category”; within that column, there were 158 unique categories.

This seemed a bit messy if I were to visualize category counts.

So after a little research, I added a new column called “main_category” to the dataset. It is made up of 15 possible categories that contain the 158 original categories.

Every sub-category now had a home in the main category and this new column could be better visualized.

df.main_category.unique()  array(['crafts', 'food', 'tech', 'publishing',     
'theater', 'design', 'music', 'art',
'games', 'photography', 'film & video',
'fashion','comics', 'dance', 'journalism'],
dtype=object)
Amount of Kickstarter Projects in All Categories from 2009–2020

Treemaps and Sunburst charts are visualizations that help to show relationships within data.

Both of these graphs illustrate the relation in size of each sub-category within their main category.

Success and Failure

Delving further into the data, we can see that within each sub-category, there is still another way to split the projects. We can look at which projects were successful and which failed to reach their goals.

Pie Chart showing 62% of all projects are successful and 38% are failures
nice, round pie

within the data:

  • 62% of projects are categorized as successful
  • 38% of all Kickstarter projects fail

I tend to lean towards pessimism, so finding that most projects are successful was surprising.

Let us not forget, Kickstarter is a business and if projects hosted on their platform succeed, Kickstarter succeeds.

An overlayed histogram can show how the main categories fair on the success scale.

How the main categories faired

Film and Music are the two most popular categories.

Journalism takes a hard hit in that most projects in that category seem to fail to meet their goal.

Crafts show a sliver of hope by just eking out a few more successes than failures.

Some stats

Min Goal and Pledged values
goal 0.01
usd_pledged 0.00
dtype: float64

Mean Goal and Pledged values
goal 47371.54
usd_pledged 14705.32
dtype: float64

Median Goal and Pledged values
goal 5000.0
usd_pledged 1891.0
dtype: float64

Max Goal and Pledged values
goal 100000000.0
pledged 20338986.27
dtype: float64

Std Goal and Pledged values
goal 1109321.83
usd_pledged 109739.24
dtype: float64

Time and Projects

Now then, what are these projects asking for from backers? MONEY.

$5,000 and $10,000 were the most popularly monetary goals for projects. Followed by $1,000, $2,000, and$3,000. Most of these goals could be reached fairly quickly with enough backers.

The sooner the pledge is over, the sooner the project could get underway and deliver what was promised to backers as well as go to market as a product.

40% of projects have a deadline of 30 days to complete their funding goal. However, a project reaching its monetary goal does not ensure that the project will fulfill its promises. Projects have had to refund pledged money for not coming through with promises made.

Duration of how long pledges were open in days

Kickstarter has gained popularity over the years. The spread of Covid-19 in 2020 has sparked more interest in projects being launched and subsequently acquiring backers and reaching their goal. This is something that can be done entirely online and allows for backers to feel like they are part of a community, doing good for others or making themselves useful.

Amount of projects launched yearly

A wordcloud using text from projects’ description gives a bit more insight into the data.

As shown in earlier graphs, music and film were the most popular project categories.

In this wordcloud, Album and Film are pretty prominent in the displayed text.

Predictions

Time to apply machine learning to the data.

After a little encoding, it is time for splitsville. The data is divided into a training set and a testing set, in the usual 80:20 split.

For this classification problem, I went with Random Forest, Logistic Regression, and an XGBoost to see which would give me the best accuracy scores.

Scores on testing data:

Random Forest: 0.7761014171406536

Logistic Regression: 0.6619343389529725

XGBoost: 0.7360026619343389

The Random Forest model has the best score, now let's try to pump it up.

After some hyperparameter tuning, I was able to get the best parameters for the model on the training data.

rfc best params: {'max_depth': 22, 'n_estimators': 1000, 'n_jobs': -1}
rfc scores: 0.7837684885154027

Performance on test data:

rfc accuracy score 0.7894853593611357

I found the Kickstarter data to be highly intriguing, I love to see data that has been compiled over several years. Trends and history are stored in that type of data and it takes a good visual artist to be able to extract the story within.

--

--