results for under over look glitchy

Let us know when something isn't working correctly, or if you find a typo. Do not post complaints or suggestions here.
User avatar
fluffy
Eruption
Posts: 11029
Joined: Sat Sep 25, 2004 10:56 am
Instruments: sometimes
Recording Method: Logic Pro X
Submitting as: Sockpuppet
Pronouns: she/they
Location: Seattle-ish
Contact:

Re: results for under over look glitchy

Post by fluffy »

There's a bunch of stupid shit going on in a lot of different places. Yet another reason that I'd really like to move to an actual database, rather than a bunch of flat files where things get injected sometimes directly and sometimes not, is that we could then formalize the way that things work.

Unfortunately, there is no clear solution that will both allow embedded HTML (e.g. x<sub>0</sub>) AND prevent < and & and such from potentially screwing things up. Personally I'd rather just not allow HTML in artist names since that clears the problem up instantly, and then X<sub>0</sub> could just use a Unicode subscript 0 instead.

The fix I've put in for now is to fix archive.txt to use < instead of <, and that just magically seems to be still showing up as _3C in the key names. I'm not changing the archive code or anything.

Anyway, thank you for bringing these issues to everyone's attention so that they can be fixed.
User avatar
Manhattan Glutton
Ice Cream Man
Posts: 1530
Joined: Tue Feb 15, 2005 12:10 pm
Instruments: Angst
Recording Method: REAPER
Location: Madison, WI
Contact:

Re: results for under over look glitchy

Post by Manhattan Glutton »

It seems there's only one name that has HTML in it - can we just retroactively change it and be consistent going forward? Better yet, is there maybe a unicode character that does what the HTML does? (EDIT: YES &#8320; Test: X₀)

All these changes are breaking my archive scraper. :)
If I had a dollar for every one of my songs j$ has called a 90s pastiche, I'd have $1 for every song I've written.

Nur Ein Archives | The New Ugly Podcast
User avatar
fluffy
Eruption
Posts: 11029
Joined: Sat Sep 25, 2004 10:56 am
Instruments: sometimes
Recording Method: Logic Pro X
Submitting as: Sockpuppet
Pronouns: she/they
Location: Seattle-ish
Contact:

Re: results for under over look glitchy

Post by fluffy »

Yep, unfortunately entities still lead to some potential inconsistencies, like if people want & in their name (which comes up a LOT). Using & for that is only a partial solution, which leads to even more fun inconsistencies down the road. Personally I feel that the internal storage for all band names should be raw text and that should be converted to HTML, XML, etc. by the display layer. Unfortunately, that philosophy doesn't work when you want a subscript 0 and your raw text is encoded ISO-8559-1.

Unfortunately, we can't just switch to UTF-8 either - that leads to all sorts of other problems, not the least of which is that most software (email clients, Dreamhost's Apache configuration, etc.) STILL assumes ISO-8859-1.
User avatar
Manhattan Glutton
Ice Cream Man
Posts: 1530
Joined: Tue Feb 15, 2005 12:10 pm
Instruments: Angst
Recording Method: REAPER
Location: Madison, WI
Contact:

Re: results for under over look glitchy

Post by Manhattan Glutton »

fluffy wrote:most software (email clients, Dreamhost's Apache configuration, etc.) STILL assumes ISO-8859-1.
Doesn't it look for the byte order at the beginning of the files? And what is particularly important about maintaining compatibility with those applications, anyway?
If I had a dollar for every one of my songs j$ has called a 90s pastiche, I'd have $1 for every song I've written.

Nur Ein Archives | The New Ugly Podcast
User avatar
fluffy
Eruption
Posts: 11029
Joined: Sat Sep 25, 2004 10:56 am
Instruments: sometimes
Recording Method: Logic Pro X
Submitting as: Sockpuppet
Pronouns: she/they
Location: Seattle-ish
Contact:

Re: results for under over look glitchy

Post by fluffy »

Most software that has encoding problems is written without an awareness that encodings are something it has to care about.

Software that does handle encodings uses the MIME type or similar to provide encoding information, but software like that isn't the issue.

Also, BOMs are also a really bad, hackish way to do things, which is also only very inconsistently supported, and leads to more problems than it fixes.
User avatar
Manhattan Glutton
Ice Cream Man
Posts: 1530
Joined: Tue Feb 15, 2005 12:10 pm
Instruments: Angst
Recording Method: REAPER
Location: Madison, WI
Contact:

Re: results for under over look glitchy

Post by Manhattan Glutton »

Manhattan Glutton wrote:And what is particularly important about maintaining compatibility with those applications, anyway?
It seems like the database only has to be compatible with itself... which is somewhat of a problem right now anyway?
If I had a dollar for every one of my songs j$ has called a 90s pastiche, I'd have $1 for every song I've written.

Nur Ein Archives | The New Ugly Podcast
User avatar
fluffy
Eruption
Posts: 11029
Joined: Sat Sep 25, 2004 10:56 am
Instruments: sometimes
Recording Method: Logic Pro X
Submitting as: Sockpuppet
Pronouns: she/they
Location: Seattle-ish
Contact:

Re: results for under over look glitchy

Post by fluffy »

There is no database. That's a big part of the problem. If the data were stored in a database in some consistent and normalized form it would be a lot simpler to ensure compatibility going forward. But it isn't, and getting to that point would be a huge amount of work.

This whole conversation, by the way, has gotten quite repetitive.
User avatar
Manhattan Glutton
Ice Cream Man
Posts: 1530
Joined: Tue Feb 15, 2005 12:10 pm
Instruments: Angst
Recording Method: REAPER
Location: Madison, WI
Contact:

Re: results for under over look glitchy

Post by Manhattan Glutton »

I think that's because I must not be communicating clearly and so you keep repeating things instead of answering what I thought I asked. I'm used to it.
If I had a dollar for every one of my songs j$ has called a 90s pastiche, I'd have $1 for every song I've written.

Nur Ein Archives | The New Ugly Podcast
User avatar
Manhattan Glutton
Ice Cream Man
Posts: 1530
Joined: Tue Feb 15, 2005 12:10 pm
Instruments: Angst
Recording Method: REAPER
Location: Madison, WI
Contact:

Re: results for under over look glitchy

Post by Manhattan Glutton »

So I guess the big question on my mind, to be incredibly explicit, is why this isn't the simple fix?
1. Rename X<sub>0</sub> so that it does not contain HTML. (AKA, rename the only existing artist with a HTMLy name. Tough luck.) Or hey, even don't rename him, I don't care.
2. Continue storing the names as latin-1 plaintext.
3. HTML/URL encode names when displaying them.

Won't that solve like 99% of the problems here? What am I missing?
If I had a dollar for every one of my songs j$ has called a 90s pastiche, I'd have $1 for every song I've written.

Nur Ein Archives | The New Ugly Podcast
User avatar
Lunkhead
You're No Good
Posts: 8107
Joined: Sat Sep 25, 2004 12:14 pm
Instruments: many
Recording Method: cubase/mac/tascam4x4
Submitting as: Berkeley Social Scene, Merisan, Tiny Robots
Pronouns: he/him
Location: Berkeley, CA
Contact:

Re: results for under over look glitchy

Post by Lunkhead »

fluffy wrote:There is no database.
There -is- a database! ;) And in that database the only character that isn't working right is the upside down exclamation point in "¡Juiceharp!" which doesn't display properly in the browser sadly. I would love to have some guidance on how to fix that, btw.

Maybe we need to flip things on their head and have the sfjukebox.org database be the system of record, holding the data that drive the songfight.org archive? I'd be happy to facilitate that. I've got RESTful Web services already, although since no one is consuming them, I can't guarantee they currently output every piece of data needed properly, and for now I've got JSON and CSV but not XML output. I can pretty easily update them to output whatever data people want in whatever format people want and can have different services for different consumers if need be. I'm pretty confident that my app can provide the most flexible storage and access to the data.

EDIT: Added fluffy's comment to which I was responding.
User avatar
Manhattan Glutton
Ice Cream Man
Posts: 1530
Joined: Tue Feb 15, 2005 12:10 pm
Instruments: Angst
Recording Method: REAPER
Location: Madison, WI
Contact:

Re: results for under over look glitchy

Post by Manhattan Glutton »

Manhattan Glutton wrote:Won't that solve like 99% of the problems here? What am I missing?
Looks like someone already did that without me knowing. Cool!!!

For the record (pun unintended), I was using the term "database" loosely before.
If I had a dollar for every one of my songs j$ has called a 90s pastiche, I'd have $1 for every song I've written.

Nur Ein Archives | The New Ugly Podcast
User avatar
Lunkhead
You're No Good
Posts: 8107
Joined: Sat Sep 25, 2004 12:14 pm
Instruments: many
Recording Method: cubase/mac/tascam4x4
Submitting as: Berkeley Social Scene, Merisan, Tiny Robots
Pronouns: he/him
Location: Berkeley, CA
Contact:

Re: results for under over look glitchy

Post by Lunkhead »

The only reason you didn't know was because you didn't ask. ;) I'd be happy to give you an export too in a format that you could just request from one URL and parse very easily, artist profile data included.
User avatar
Lunkhead
You're No Good
Posts: 8107
Joined: Sat Sep 25, 2004 12:14 pm
Instruments: many
Recording Method: cubase/mac/tascam4x4
Submitting as: Berkeley Social Scene, Merisan, Tiny Robots
Pronouns: he/him
Location: Berkeley, CA
Contact:

Re: results for under over look glitchy

Post by Lunkhead »

fluffy wrote:Unfortunately, there is no clear solution that will both allow embedded HTML (e.g. x<sub>0</sub>) AND prevent < and & and such from potentially screwing things up.
I'm not entirely clear on how it wouldn't solve the problem by allowing HTML in band names (a limited "safe" subset of tags anyway) and then escaping >, <, and & with >, <, and &? I'm not saying I think you're wrong, just saying I don't understand completely your explanation for why that's insufficient.
User avatar
Lunkhead
You're No Good
Posts: 8107
Joined: Sat Sep 25, 2004 12:14 pm
Instruments: many
Recording Method: cubase/mac/tascam4x4
Submitting as: Berkeley Social Scene, Merisan, Tiny Robots
Pronouns: he/him
Location: Berkeley, CA
Contact:

Re: results for under over look glitchy

Post by Lunkhead »

Also, I think there shouldn't be any HTML allowed in fight titles, and that the "<BR>" from the "Rockopolous..." title should be removed. That kind of formatting stuff ought to be handled in the view, not in the data.
User avatar
Manhattan Glutton
Ice Cream Man
Posts: 1530
Joined: Tue Feb 15, 2005 12:10 pm
Instruments: Angst
Recording Method: REAPER
Location: Madison, WI
Contact:

Re: results for under over look glitchy

Post by Manhattan Glutton »

But Saaaaam! I worked so hard on my scraper!!! These are some *pro* regexps! ;) Maybe someday I'll give up on it.

You know what would be really kickass if someone made? An intelligent diff of the archive that says "these are new artists, and this is the latest fight" or something of that sort to make my bot easier. I haven't worked out that kink yet. :)
If I had a dollar for every one of my songs j$ has called a 90s pastiche, I'd have $1 for every song I've written.

Nur Ein Archives | The New Ugly Podcast
User avatar
Lunkhead
You're No Good
Posts: 8107
Joined: Sat Sep 25, 2004 12:14 pm
Instruments: many
Recording Method: cubase/mac/tascam4x4
Submitting as: Berkeley Social Scene, Merisan, Tiny Robots
Pronouns: he/him
Location: Berkeley, CA
Contact:

Re: results for under over look glitchy

Post by Lunkhead »

Well, luckily for you, I have implemented code to do that. My code is basically a synch with the songfight.org data. I do not do a full delete of my local data followed by a full import of the songfight.org data. That would be crazy! ;)

All my data can be queried, too. So if what you're saying is you want artists/fights/songs added since a certain timestamp (say the last time you queried my service) you could do that pretty easily.
User avatar
fluffy
Eruption
Posts: 11029
Joined: Sat Sep 25, 2004 10:56 am
Instruments: sometimes
Recording Method: Logic Pro X
Submitting as: Sockpuppet
Pronouns: she/they
Location: Seattle-ish
Contact:

Re: results for under over look glitchy

Post by fluffy »

Lunkhead wrote:
fluffy wrote:Unfortunately, there is no clear solution that will both allow embedded HTML (e.g. x<sub>0</sub>) AND prevent < and & and such from potentially screwing things up.
I'm not entirely clear on how it wouldn't solve the problem by allowing HTML in band names (a limited "safe" subset of tags anyway) and then escaping >, <, and & with >, <, and &? I'm not saying I think you're wrong, just saying I don't understand completely your explanation for why that's insufficient.
Because allowing limited HTML is fragile and inconsistent and requires a lot of extra code to deal with weird cases like someone actually trying to have their name be <sub>, for example. Embedded HTML also causes huge problems for things that aren't HTML, such as RSS feeds (such as the podcast) and, you know, wikis.

HTML is great if you're only dealing with HTML, but nothing exists in an HTML vacuum anymore.
User avatar
Lunkhead
You're No Good
Posts: 8107
Joined: Sat Sep 25, 2004 12:14 pm
Instruments: many
Recording Method: cubase/mac/tascam4x4
Submitting as: Berkeley Social Scene, Merisan, Tiny Robots
Pronouns: he/him
Location: Berkeley, CA
Contact:

Re: results for under over look glitchy

Post by Lunkhead »

Sure. I wouldn't think that just "allowing HTML" would work. I'm more just thinking of what some sites do where a few specific tags are allowed. I'm also not envisioning any kind of programmatic enforcement of that in the current system. It would have to be manual, basically an agreement of the people handling fight submission intake to tell submitters "Sorry, only the following tags are allowed in artist names: ..." Also they'd have to assume that people who submit artist names containing invalid HTML with <, >, and & in them (e.g. "<sub>", "<3", "So<kpupp3t") want them to show up as given, and would manually encode those characters appropriately ("<subglt;", "<3", "So<kpupp3t") before entering the artist names into the data files.
User avatar
Lunkhead
You're No Good
Posts: 8107
Joined: Sat Sep 25, 2004 12:14 pm
Instruments: many
Recording Method: cubase/mac/tascam4x4
Submitting as: Berkeley Social Scene, Merisan, Tiny Robots
Pronouns: he/him
Location: Berkeley, CA
Contact:

Re: results for under over look glitchy

Post by Lunkhead »

What I do in cases when I want to use the artist name but HTML tags aren't allowed, like using the artist name in the <title></title> of a page, is just strip out the tags. It's not possible to show the name the way the artist wants anyway at that point, so it seems OK to me to just rip out the annoying bits.

http://sfjukebox.org/artists/X%3Csub%3E0%3C/sub%3E

The same could be done in the podcast content, and even in the wiki, though I'd argue that something that's ultimately rendering HTML anyway ought to have some way of dealing with HTML.

Anyway, I'm just saying it's possible to deal with it to a degree. I'm not necessarily in favor of it but it's not up to me and it seemed like the decision had been made that things would be the way they are (that one name has HTML, while other names have unescaped <, >, and & in them). I'd really just love it if the data were consistent one way or the other (either no HTML, or no unescaped <, >, and &).
User avatar
Manhattan Glutton
Ice Cream Man
Posts: 1530
Joined: Tue Feb 15, 2005 12:10 pm
Instruments: Angst
Recording Method: REAPER
Location: Madison, WI
Contact:

Re: results for under over look glitchy

Post by Manhattan Glutton »

Yeah it seems consistency is the real issue. If we just pick one way and stick with it, that would be fantastic. I'm throwing my vote in for no HTML whatsoever.
If I had a dollar for every one of my songs j$ has called a 90s pastiche, I'd have $1 for every song I've written.

Nur Ein Archives | The New Ugly Podcast
User avatar
Spud
Hot for Teacher
Posts: 4770
Joined: Fri Sep 24, 2004 10:25 am
Instruments: Bass, Keyboards, eHorn
Submitting as: Octothorpe
Location: Seattle
Contact:

Re: results for under over look glitchy

Post by Spud »

Lunkhead wrote:Sure. I wouldn't think that just "allowing HTML" would work. I'm more just thinking of what some sites do where a few specific tags are allowed. I'm also not envisioning any kind of programmatic enforcement of that in the current system. It would have to be manual, basically an agreement of the people handling fight submission intake to tell submitters "Sorry, only the following tags are allowed in artist names: ..." Also they'd have to assume that people who submit artist names containing invalid HTML with <, >, and & in them (e.g. "<sub>", "<3", "So<kpupp3t") want them to show up as given, and would manually encode those characters appropriately ("<subglt;", "<3", "So<kpupp3t") before entering the artist names into the data files.
What is the advantage of allowing any html in band names? If you're going to make a rule that says that some is and some isn't, then clearly you have rule-making authority, so why not just ban it?
"I only listen to good music. And Octothorpe." - Marcus Kellis
Song Fight! The Rockening
User avatar
fluffy
Eruption
Posts: 11029
Joined: Sat Sep 25, 2004 10:56 am
Instruments: sometimes
Recording Method: Logic Pro X
Submitting as: Sockpuppet
Pronouns: she/they
Location: Seattle-ish
Contact:

Re: results for under over look glitchy

Post by fluffy »

Exactly. Why do a partial half-baked solution that only makes your life more difficult?
Post Reply