Dawg Posted May 19, 2010 Report Share Posted May 19, 2010 The sub for CSI Miami - 08x23 - Time Bomb.LOL.English.org contains some strange characters in the following time ref's. 00:00:42,632 --> 00:00:44,900 00:01:14,597 --> 00:01:16,865 Link to comment Share on other sites More sharing options...
honeybunny Posted May 19, 2010 Report Share Posted May 19, 2010 ¶ ¶? These are some characters to suggest music :) Link to comment Share on other sites More sharing options...
Dawg Posted May 19, 2010 Author Report Share Posted May 19, 2010 ¶ ¶? These are some characters to suggest music No these are more like a capital T with an extended top bar and 2 lines right net to it. Link to comment Share on other sites More sharing options...
honeybunny Posted May 19, 2010 Report Share Posted May 19, 2010 Lol they don't look like that to me ) Link to comment Share on other sites More sharing options...
Dawg Posted May 19, 2010 Author Report Share Posted May 19, 2010 Lol they don't look like that to me ) Ok I'll bite what do they look like to you? BTW Thanks for recommending VLC program, it's saving me from a lot of head aches caused by the DivX player. Link to comment Share on other sites More sharing options...
honeybunny Posted May 20, 2010 Report Share Posted May 20, 2010 ¶ ¶ like this. I posted them above I wonder how you see this. Link to comment Share on other sites More sharing options...
Dawg Posted May 20, 2010 Author Report Share Posted May 20, 2010 ¶ ¶ like this. I posted them above I wonder how you see this. I see em as there displayed here, and as they display in VLC, what i was looking at (the attachment above) was the "raw" (so to speak) character in the sub, thus I was thinking it was a "unprintable" (again so to speak) character, as I hadn't gotten around to watching that episode yet. Now having seen the episode I saw that indeed (as you said) that it was what some of you folks use to denote music. Question, Why not use the actual music note symbol? Link to comment Share on other sites More sharing options...
honeybunny Posted May 20, 2010 Report Share Posted May 20, 2010 You'd have to ask the guys at the captioning companies We just sync the scripts. Link to comment Share on other sites More sharing options...
Adriano_CSI Posted June 3, 2010 Report Share Posted June 3, 2010 lol Link to comment Share on other sites More sharing options...
egk Posted September 13, 2010 Report Share Posted September 13, 2010 You'd have to ask the guys at the captioning companies We just sync the scripts. This is an old thread but I was wondering about this myself. Watching the True Blood finale in VLC, I notice the songs all have odd characters. I know they're supposed to be the characters signifying music but they don't appear that way in VLC. I'm guessing if I made a DVD they might be fine. In VLC they look just like what's in the subtitle itself. <i>â?ª And every shadow â?ª</i> Link to comment Share on other sites More sharing options...
honeybunny Posted September 14, 2010 Report Share Posted September 14, 2010 u can replace them with * before watching. the encoding does not always work. Link to comment Share on other sites More sharing options...
tzot Posted September 15, 2010 Report Share Posted September 15, 2010 u can replace them with * before watching. the encoding does not always work. (See also here: ) Guys, a little Unicode primer here. ASCII characters (bytes 0-127) are stored the same way in a file with all currently-used encodings. The issue begins when wanting to store non-ASCII characters; everything put to a file MUST be converted to bytes, and those bytes can mean many things, depending on the encoding used. Now, I understand the site stores the files internally as UTF8. That means that: “é” (Unicode character 233) as UTF8 becomes “é” (2 bytes with values [195, 169]). When uploading through the web interface, I've seen that there is no way to tell the form that the subtitle is already encoded as UTF8, so choosing e.g. English as the language, the site interprets the incoming data as CP1252 (Windows Western). In CP1252, every character takes exactly one byte, so the site converts every byte to UTF8, and the two bytes “é” are thought to be interpreted as two Unicode characters, so they are UTF8 converted to “é” (4 bytes: [195, 131, 194, 169]) and stored like this. When you download the subtitle and use it, the player understands that this is a UTF8 encoded file, so it decodes UTF8 and the 4 bytes of my example become 2 Unicode characters: “é” which you see on your screen. In the example of the character “?”: this is the Unicode character 9834, named “EIGHTH NOTE”. Stored as UTF8, it becomes “♪” (3 bytes: [226, 153, 170]), and quite possibly the uploader sees the subtitle correctly in their player. If the file was stored exactly like that in the site, everything would be fine; however the process that I described occurs during the upload, so the site interprets the 3 bytes as CP1252, decodes them into 3 separate Unicode characters (“♪”) instead of 1 Unicode character (“?”) and encodes them as UTF8 into 6 bytes: “♪” (bytes: [195, 162, 226, 132, 162, 194, 170]). You download the file, the media player understands it's a UTF8 encoded file, so it decodes UTF8 (and it does this decoding ONCE) and the 6 bytes become 3 characters “♪”, which are the ones shown on your screen. Now, sometimes the input files are not UTF8 encoded, but GBK encoded (another way to store Unicode in a file, which applies to Chinese), and this happens often to subtitles acquired from yyets.net (something like that); there the thing becomes more troublesome, since during the upload the GBK-encoded bytes are decoded as CP1252 and then encoded and stored as UTF8. Hell broke loose. Confusing? Sorry I have a Python script that fixes these things automatically (95% of the time) and produces a correct UTF8 file; I can make that script available to uploaders and editors, who hopefully upload not using the web interface. However, a way to upload raw bytes to the site (without any encoding/decoding process) MUST be created for us lame uploaders, so that these issues can be solved much easier, or become non-existent in the first place. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now