Set.insert vs array << x unless array.include?(x)
19 posts
• Page 1 of 1
Set.insert vs array << x unless array.include?(x)I was curious of the difference between Set.insert vs array << x unless array.include?(x) when you want a collection with a unique set of values.
So I did some tests: Test 1 A 10,000,000 times I tried to insert a number between 0-10. t=Time.now;a=[];10000000.times{r=rand(10);a<<r unless a.include?(r)};puts Time.now - t Result: 12.297 t=Time.now;a=Set.new;10000000.times{a.insert(rand(10))};puts Time.now - t Result: 15.719 The array was faster here by three seconds. But that's with ten million iterations! Test 2 But, when your collection of unique values increases: t=Time.now;a=[];100000.times{r=rand(10000);a<<r unless a.include?(r)};puts Time.now - t Result: 45.129 t=Time.now;a=Set.new;100000.times{a.insert(rand(10000))};puts Time.now - t Result: 0.391 When your collection include a much higher number of elements - then the Array quickly starts to crumble. In this case, when the possible unique set was up to 100,000 the difference was 44.738 seconds!!! I tried this in some of my scrips which on occasions require a unique set of entites (like collecting all vertices) and the speed difference was immense on larger models. It doesn't take too many number of unique elements before Set becomes faster then array.include?. Scaling Just to illustrate how they scale: Array.include? t=Time.now;a=[];1000.times{r=rand(100);a<<r unless a.include?(r)};puts Time.now - t 0.0 t=Time.now;a=[];10000.times{r=rand(1000);a<<r unless a.include?(r)};puts Time.now - t 0.485 t=Time.now;a=[];100000.times{r=rand(10000);a<<r unless a.include?(r)};puts Time.now - t 42.906 t=Time.now;a=[];1000000.times{r=rand(100000);a<<r unless a.include?(r)};puts Time.now - t Took too long. I guess it'd take ~4000.0s Set.insert t=Time.now;a=Set.new;1000.times{a.insert(rand(100))};puts Time.now - t 0.0 t=Time.now;a=Set.new;10000.times{a.insert(rand(1000))};puts Time.now - t 0.047 t=Time.now;a=Set.new;100000.times{a.insert(rand(10000))};puts Time.now - t 0.422 t=Time.now;a=Set.new;1000000.times{a.insert(rand(100000))};puts Time.now - t 4.937 Conclusion (not!) Use the Set class when you require a unique set of values - unless your set of unique values is very small. Set class is the rule - Array.include? is the exception. Update None of the above is the fastest. A better way has appeared. Collect everything into an array and apply .unique! at the end. viewtopic.php?f=180&t=23760&p=222231#p222221 Last edited by thomthom on Fri Nov 20, 2009 1:33 pm, edited 1 time in total.
Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: Set.insert vs array << x unless array.include?(x)Also, Set.include? is faster than Array.include?
Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: Set.insert vs array << x unless array.include?(x)Tom,
very useful actually. It really looks that the processsing in C is a lot faster than in Ruby. Thanks Fredo
Re: Set.insert vs array << x unless array.include?(x)
Set class makes more processing in C? What I'm wondering is, the Set class in SU Ruby is not the same Set class you get in Ruby 1.8.0. The Set class in Ruby 1.8.0 includes Enumerable and the set.rb looks to be a pure ruby class that mixes arrays and hashes. From my Ruby 1.8.0. installation, the Set class seem to be a pure Ruby implementation. But maybe the Set class that comes with SU is a C implementation...? Maybe that's why they replaced it? Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: Set.insert vs array << x unless array.include?(x)I thought the speed difference was that the Set class used a more efficient Hash lookup as oppose to the Array functions that has to iterate the array every time... but that's just guesswork...
Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: Set.insert vs array << x unless array.include?(x)
You're right. I personally use Hash whenever I want to store lists that have unique elements. The list itself is obtained via Hash.values. Maybe, as you are on this, you extend your benchmark to Hash and see how it compares with Set. Fred
Re: Set.insert vs array << x unless array.include?(x)Great info Thom! I had played around with sets and arrays before and I was not impressed with sets. But I did not experiment with amount of unique objects, which apparently has an adverse effect on arrays. Thanks,
Chris
Re: Set.insert vs array << x unless array.include?(x)[quote="Fredo6"
Maybe, as you are on this, you extend your benchmark to Hash and see how it compares with Set.[/quote] Will have a look at that. Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: Set.insert vs array << x unless array.include?(x)How about using Array.uniq! method:
Test 1 t=Time.now;a=[];10000000.times{r=rand(10);a<<r unless a.include?(r)};puts Time.now - t Result: 12.297 t=Time.now;a=Set.new;10000000.times{a.insert(rand(10))};puts Time.now - t Result: 15.719 t=Time.now;a=[];10000000.times{r=rand(10);a<<r};a.uniq!; puts Time.now - t Result: 7.753 Test 2 t=Time.now;a=[];100000.times{r=rand(10000);a<<r unless a.include?(r)};puts Time.now-t Result: 40.97 t=Time.now;a=Set.new;100000.times{a.insert(rand(10000))};puts Time.now-t Result: 0.377 t=Time.now;a=[];100000.times{r=rand(10000);a<<r};a.uniq!;puts Time.now-t Result: 0.087 Last edited by Jernej Vidmar on Fri Feb 05, 2010 1:40 pm, edited 2 times in total.
Re: Set.insert vs array << x unless array.include?(x)I'll be damned!
Very interesting Jernej. ...looks like I need to do some more testing of my script and possibly refactor again. So while the Array.include? is dead slow - the overhead of hash look-up is still faster than just adding everything into one big pile and so a single filtering afterwards... Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: Set.insert vs array << x unless array.include?(x)looking at the .uniq! source code: http://ruby-doc.org/core/classes/Array.src/M002215.html
Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: Set.insert vs array << x unless array.include?(x)Jernej: how about larger iterations and higher number of random values?
Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: Set.insert vs array << x unless array.include?(x)
t=Time.now;a=Set.new;10000000.times{a.insert(rand(10000))};puts Time.now - t Result: 37.911 t=Time.now;a=[];10000000.times{r=rand(10000);a<<r};a.uniq!; puts Time.now - t Result: 8.282 Still a winner?
Re: Set.insert vs array << x unless array.include?(x)It's refactoring time!
Nice find! ![]() Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: Set.insert vs array << x unless array.include?(x)That's all great (using .uniq!) until you start dealing with Point3d objects
![]() In that case, always use Set. RickW
www.smustard.com Re: Set.insert vs array << x unless array.include?(x)....or make all of your Point3d's into arrays so they will sort!/uniq! etc as arrays...
TIG
Re: Set.insert vs array << x unless array.include?(x)
But is the overhead of converting the Point3d's into arrays and uniq! faster than using a Set? Thomas Thomassen — SketchUp Monkey & Coding addict
List of my plugins and link to the CookieWare fund
Re: Set.insert vs array << x unless array.include?(x)Who knows ?
Time for you to do another test... ![]() TIG
Re: Set.insert vs array << x unless array.include?(x)I probably don't know what I am doing, but I ran the following test, and obtained the attached results. I typically use array.push variable, and don't understand the situations when the other examples might be used. Btw, when I applied the other forms to my app, it failed in ways that leave me to believe that those forms are data sensitive. Can anyone explaine to a Ruby beginner what's up?
19 posts
• Page 1 of 1
|