[Code] file_found?(path) and to_ascii+to_unicode.rb
42 posts
• Page 1 of 2 • 1, 2
[Code] file_found?(path) and to_ascii+to_unicode.rbThis "file_found?(path)" can be used in place of "FileTest.exist?(path)" which is known to return an erroneous "false" when the file does indeed exist - this occurs if the system returns the filepath containing 'unicode' characters (typically accented "qualisé.skp", "C:\\test æøå\\" etc) but perversely Ruby SUp returns the modelpath in raw 'ascii': unfortuantely these will look the same but they are != ...
It will fail if the unicode can't be translated directly into ascii... Thanks to thomthom for streamlining my code etc whilst he was trying to fix his Norwegian character problems... Last edited by TIG on Wed Jul 01, 2009 5:28 pm, edited 1 time in total.
TIG
Re: [Code] file_found?(path)Maybe add a method that returns the UTF-8 string as ASCII? It's needed when you pass the string over to open files using the File class.
The file dialog also returns UTF-8 string so that also has to be processed before it's given to any string or file manipulation methods. Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbThis is an updated version - Note that if you have code containing an earlier version then to+ascii() becomes to_unicode() and vice versa - you will need to make adjustments...
Ruby Code: to_ascii+to_unicode.rb Usage: text = to_unicode(txt_ascii) text = to_ascii(txt_unicode) Returns text either converted to Unicode or ASCII characters... Why ? SUp returns model.path, txt=UI.openpanel("?","c:\\","*.txt") etc in Unicode characters, whereas the Ruby system returns things in ASCII: so e.g. FileTest.exist?(txt) wrongly returns 'false' - it fails to see the match althouigh the file really exists. However, txt = UI.openpanel("?","c:\\","*.txt") FileTest.exist?(to_ascii(txt)) correctly returns 'true', as does... FileTest.exist?(to_ascii(Sketchup.active_model.path)) (Note you no longer need to use the limited and clunky 'file_found?(path)' ruby) If for some reason you have ASCII text it can be made into Unicode using the other form... text = to_unicode(txt)... v1.1 20090707 Names swapped round to reflect correct usage - [thanks to thomthom] Last edited by TIG on Tue Jul 07, 2009 8:43 am, edited 1 time in total.
TIG
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbI think you've swapped the method names for the two methods. They do the complete opposite of what I expect.
Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbI initially thought that too...
BUT the way I looked at it was that SUp returns model.path, openpaneletc in basic ascii text whereas FileTest.exist?() looks through the file names in unicode [which might contain accented characters without the normal ascii codes], so to test if model's path exists you need to make it into unicode... therefore FileTest.exist?(to_unicode(model.path)) returns true with accented characters... since the model's path is already in ascii we shouldn't to_ascii() it ??? Is it me going mad or is my thinking the right way round... ![]() TIG
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbI was under the impression that SU's ruby methods returned UTF-8 (definition.name also returns UTF-8) and Ruby itself assumed ASCII.
But in any case a metod named .to_ascii I'd assume returns ASCII - while yours return UTF-8. And vice versa. Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbThanks. I have noticed the problem before, but I couldn't tell what was wrong.
Tomasz Author of Thea Render for SketchUp
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbDoes this fix umlauts as well? I just had a Swedish user stumble onto this problem last week. CB.
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbIt should fix most European accented and odd-characters that unicode puts as 2 ascii bits... but I don't know about Asian etc... Any users out there ?
TIG
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbI think it only works if the characters exists in ASCII character set...
Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbI think thomthom is right.
If a raw ASCII character has a direct equivalent Unicode encoding them it works fine - and that should apply to all 0-255 ASCII characters - including those above the normal keyboard set... €‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬®¯°± ²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ. However, lots of other Asian languages etc will use Unicode sets that have no direct translation into ASCII... TIG
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbI think most Western Latin character set will work.
Though, after looking at your code again TIG, your to_ascii method returns unicode, while to_unicode returns ascii. Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbthomthom
I swapped the names over because what I now call to_unicode(model.path) works !: the ASCII code returned by SUp Ruby is made into Unicode friendly code - so then using FileTest.exist?(to_unicode(model.path)) returns true= working properly, BUT FileTest.exist?(model.path) returns false= not working properly. If you use txt=model.path you get ASCII: with an accented model.skp file-name, when you list its character codes you get a single code for the accented letter. Now use FileTest and Dir tools on the same directory and get the same file's name - then list its character codes - although they look the same on the page/screen the codes are not identical so no == possible when comparing raw SUp ASCII model.path and FileTest... - that's because Ruby outside of SUp returns it in Unicode for the accented characters [if you just use the base set of characters and numbers they match - accented characters use different encoding though]; the list of character codes in this 'extra=SUp' version will show the accented letter represented by two codes, not one = Unicode... It's ironic that it's called 'uni'code when it effectively uses two characters to encode accented letters - that can be done in one with ASCII... ![]() ![]() TIG
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbI'm afraid that's wrong.
SU methods, such as model.path and definition.name returns Unicode strings. While the File class uses ACSII. Your to_unicode returns an ACSII string, it packs the double/multi byte characters of the unicode strings and packs it into single byte ACSII. That's why it works.
It's not really characters - but bytes. ACSII always uses 1 bytes per characters with a limited character set. UTF-8 uses from one-two bytes per character with a much greater characterset range and can be used in a great number of languages. Unicode uses one-four bytes per characters and are supposed to be able to define all languges - hence unicode. Also, ASCII only defines 128 character - half of the possibilities of a byte - which is why accented characters fall outside this and have 2byte characters in UTF-8. The reason why we see two characters in Ruby (also in PHP) is because it always assumes 1 byte per character and interpret the string incorrectly. Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbSorry, UTF-8 uses one-four bytes...
Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbthomthom
I at last see your explanation now... It's Ruby that's wrong NOT SUp... The more modern coding of SUp means that it returns text that is out of step with Ruby, which is a bit old and clunky when it comes to Unicode ? ![]() ![]() Whatever it's called it still works !!! ... Perhaps we [I] should have given it another 'neutral' name - however, I have simply swapped over the names and issued an updated version here... viewtopic.php?p=169274#p169274 TIG
Re: [Code] file_found?(path) and to_ascii+to_unicode.rb
Yes. That is seen if you type "å".length in the ruby console. the console also uses Unicode (I think SU uses UTF-8 encoding.) The return value is 2 - which shows that ruby doesn't deal with multi-byte characters. I had the very same problem with PHP when I was making websites. It also assumes 8bit per character. I thought PHP was the only dinosaur and I'm very surprised that this new Ruby language doesn't deal with Unicode strings.
Might be good to find some names that relates best to what is actually returned. It's not really ACSII ruby uses either - as mentioned ASCII only uses the first 128 values of a byte. It might be ANSI - which extends ACSII to full 256. But it might be an ISO-8859-x encoding (in which case it's most likely ISO-8859-1 Latin-1 Western European). When I look at Norepad++ and Notepad they are both set to encode in ANSI - so my current best guess is ANSI. I'll have to have a closer look to what the methods actually do to determine what it actually returns. Character encodings are a nightmare. And with SU being UTF-8 and Ruby ANSI(?) - this is just begging for problems. I think there's some UTF-8 libraries which we can use without breaking existing code. Some scripts might rely on Ruby treating multi-byte characters as multiple characters so we can't really modify the existing methods. But maybe some of the libraries offer some good conversions tool. Maybe a custom Unicode string class which people can use if they need to deal with unicode characters and string manipulations. Converting from unicode to ANSI and back is so prone to errors. (I spent ages struggling with this when I was making a parser in PHP for my UTF-8 encoded website.) I've been meaning to write up a gotcha-thread for ruby scripting - character encoding is one of the important points I need to include. Considering the widespread usage of SU in various languages I'm surprised this problem hasn't been mentioned more in the forums. Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbOk. I'm pretty sure that Ruby uses ANSI. "ANSI" isn't really the name either (character encoding is a topic that will hurt your brain) but actually Windows-1252. It's not really an ANSI stanrad. Windows-1252 is a superset of ISO-8859-1.
UTF-8 which SU uses (I'm pretty sure of this. I was reverse engineering the .skp format and that used an mix of ANSI and UTF-8) is backwards compatible with ASCII - all ASCII characters (the first 128 of UTF-8) are mapped the same in UTF-8. Characters outside the 128 ASCII set is mapped with at least two bytes. However, Ruby uses Windows-1252 (ANSI), which also maps the ASCII set to it's first 128 characters, but it has extra characters for it's remaining 128. This means for western european languages we can map back UTF-8 strings back to ANSI with a fair success. This is that the two methods TIG got does. I don't know what happens if you try to map UTF-8 characters that doesn't exist in the Windows-1252 set. TIG: my suggestion for the method names: .to_ansi .to_utf8 This will correctly describe what they do. Unicode can be many things, it could have more bytes per characters in different byte order, so utf8 will give the correct assumption of what will happen. Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbOk, digging deeper into this, Ruby 1.8 doesn't have an encoding type at all. It simply treats String as a series of bytes (8-bits) http://blog.grayproductions.net/article ... in_ruby_18
Since ASCII, Windows-1252 and ISO-8859-? works within the 8bit range they can be handled in Ruby without further processing of the string. Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: [Code] file_found?(path) and to_ascii+to_unicode.rb
Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbhttp://blog.grayproductions.net/categor ... _encodings <- looks like some interesting reading.
Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbNot sure if .to_ansi is correct name any more since Ruby have no encoding. Maybe it's an co-incidence that it works on my system (Windows) which uses the Windows-1252 encoding. I expect that return str_utf8.unpack('U*').pack('C*') returns correct ASCII for the first 128 byte set, but it's the rest that has me puzzled. On my system it maps fine to ANSI, but maybe a different system might behave differently when it comes to the accented characters....
Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbhttp://redmine.ruby-lang.org/issues/show/877
This is in the lines of what I thought. The file / OI classes under windows appear to demand ANSI (1252) to operate. Question is; what happens on Mac systems? I need to poke around on my Mac when I get home. Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbThis is also interesting comment:
Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbOk, I know I'm going back and forth here, but at the moment .to_ansi might be wrong. .to_ascii might not be complete indication of what's returned, but we know that we get the ASCII characters. .to_single_byte_characters or .to_unsigned_chars is more accurate, but a bit long names.
Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbWhy don't we call the script "friendlytext.rb" and call them
ruby_friendly(text) and sup_friendly(text) That way FileTest.exist?(ruby_friendly(model.path)) returns true and you can turn Ruby made text back to suit SUp using the other form ? ... This also doesn't enter into this ansi/ascii/unicode/utf8 territory which looks like a quicksand... It also doesn't apportion blame !!! thomthom, you seem to be spending more time on this than me... why don't you take it over and decide what to call the methods ? I'd be happy to hand it over... TIG
Re: [Code] file_found?(path) and to_ascii+to_unicode.rb
That's a very pragmatic solution to it. I like it!
Since I use Norwegian characters that fall into this encoding trap it's rather important to me to know what's going on. I might look into some extra set of helper functions that I'll add to the SKX project. But I'd need some time to work out what's really going on. For now these snippets will provide enough functionalities for most western languages. Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbthomthom
She's all yours... If you want me to remove any early stuff let me know [PM etc]... TIG
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbI think it can stick with the friendly names you suggested.
I'm doing more research. Just signed up for a Ruby forum to work out how Ruby behaves. Once I've gathered the info I need I'll make a thread describing the findings. Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: [Code] file_found?(path) and to_ascii+to_unicode.rbFYI, for anyone that (most unlikely) might be following on my ramblings - I've initiated a new thread over at a Ruby forum for further investigations: http://www.ruby-forum.com/topic/191016#833043
Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
42 posts
• Page 1 of 2 • 1, 2
|
Who is online
Users browsing this forum: Bing [Bot] and 16 guests