What's wrong with this picture?

NJtoTX Revered Member @njtotx Posts: 35,347 Likes: 22,618	What's wrong with this picture? Sept 12, 2017 22:42:52 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by NJtoTX on Sept 12, 2017 22:42:52 GMT
	I have named the dogs next door "STFU."

Admin Administrator @admin Posts: 13,055 Likes: 9,503	What's wrong with this picture? Sept 13, 2017 0:13:35 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Admin on Sept 13, 2017 0:13:35 GMT It's evolution, baby!
	If you aren't sure if your post will violate the ToS, just don't post it. Err on the side of caution and this forum will live a long and happy life. Terms of Service \| Community Guidelines \| Forum Rules

cupcakes Senior Member @cupcakes Posts: 5,158 Likes: 312	What's wrong with this picture? Sept 13, 2017 18:24:03 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by cupcakes on Sept 13, 2017 18:24:03 GMT ^tpfkar Wonder what measures "most characteristic" ? Me and Mr. Hohner

Terrapin Station
Elite Member

Let your freak flag fly

@terrapinstation
Posts: 15,906
Likes: 4,623

What's wrong with this picture? Sept 13, 2017 18:32:53 GMT

Quote

Post by Terrapin Station on Sept 13, 2017 18:32:53 GMT

Sept 13, 2017 18:24:03 GMT cupcakes said:

^tpfkar

Wonder what measures "most characteristic" ?

Me and Mr. Hohner

Yeah, I was wondering that, too. They have terms like "paperback," "cuerpo," and "like diamond" on there. There's no way those were among the most common terms in lyrics.

I play every song with a funky break.

Deleted Deleted Member @Deleted Posts: 0 Likes:	What's wrong with this picture? Sept 13, 2017 23:33:10 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Deleted on Sept 13, 2017 23:33:10 GMT Because once everything's been done enough times, it's deemed necessary to push the envelope to stay relevant.

NJtoTX
Revered Member

@njtotx
Posts: 35,347
Likes: 22,618

What's wrong with this picture? Sept 13, 2017 23:39:07 GMT

Quote

Post by NJtoTX on Sept 13, 2017 23:39:07 GMT

Sept 13, 2017 18:24:03 GMT cupcakes said:

^tpfkar

Wonder what measures "most characteristic" ?

Me and Mr. Hohner

I used the xml and RCurl packages to scrape song and artist names from each Wikipedia entry. I then used that list to scrape lyrics from sites that had predictable URL strings (for example, metrolyrics.com uses metrolyrics.com/SONG-NAME-lyrics-ARTIST-NAME.html). If the first site scrape failed, I moved onto the second, and so on. About 78.9% of the lyrics were scraped from metrolyics.com, 15.7% from songlyrics.com, 1.8% from lyricsmode.com. About 3.6% (187/5100) were unavailable.

The dataset features 5100 observations with the features rank (1-100), song, artist, year, lyrics, and source. The artist feature is fairly standardized thanks to Wikipedia, but there is still quite a bit of noise when it comes to artist collaborations (Justin Timberlake featuring Timbaland, for example). If there were any errors in the lyrics that were scraped, such as spelling errors or derivatives like "nite" instead of "night," they haven't been corrected.

Method seems to be:

Attribute the lyrics to a certain decade, break them into ngrams, calculate the log likelihood for each ngram/decade pair, and rank them to create a list of most characteristic words/phrases for each decade.

kaylinwalker.com/50-years-of-pop-music/

Last Edit: Sept 13, 2017 23:42:33 GMT by NJtoTX

I have named the dogs next door "STFU."

What's wrong with this picture?

Post by NJtoTX on Sept 12, 2017 22:42:52 GMT

Post by Admin on Sept 13, 2017 0:13:35 GMT

Post by cupcakes on Sept 13, 2017 18:24:03 GMT

Post by Terrapin Station on Sept 13, 2017 18:32:53 GMT

Post by Deleted on Sept 13, 2017 23:33:10 GMT

Post by NJtoTX on Sept 13, 2017 23:39:07 GMT