generators != lists (?)

My never-ending quest to write automated tests for web applications mostly involves twill tests, run with nose. Because I am muddling through this more or less on my own, I rely very heavily on documentation and examples, most of which were written by developers for other developers. I’m not a developer, so I spend a lot of time being really confused and asking my devs annoying questions.

Here is today’s issue, probably trivial to anyone who does this regularly, but I’m sure to forget about it and gnash my teeth later.

The twill docs tell me that, “When called from Python, this function [showlinks()] returns a list of the link objects.” Awesome. What I am trying to do is follow link[0], which has no name (so I can’t use follow()). I figured I could get the URL out of the list, and go there. Easy lemon squeezy, right?

But here’s what happens:


>>> showlinks()
Links:
0. ==> http://www.rfc-editor.org/rfc/rfc2606.txt
< generator object at 0x89828 >

Okay, so now I’ve got a generator object to play with. It can make a list, but it doesn’t just return one the way I thought it would. But whatever, let’s make it a list:


>>> links = list(showlinks())
Links:
0. ==> http://www.rfc-editor.org/rfc/rfc2606.txt
>>> links[0]
Link(base_url='http://www.example.com', url='http://www.rfc-editor.org/rfc/rfc2606.txt', text='RFC 2606', tag='a', attrs=[('href', 'http://www.rfc-editor.org/rfc/rfc2606.txt')])

Aaaaand now I have a… list of tuples? I’m not sure what all I was expecting would be in a “link object,” but it wasn’t quite so much information. But I figure I’m almost there. Sadly, not so much.I think my main problem came from that “Link” in the beginning. What is that? I still don’t know. I tried a lot of things that made sense to me at the time:


>>> links[0][1]
Traceback (most recent call last):
File "", line 1, in ?
AttributeError: Link instance has no attribute '__getitem__'
>>> dir(links)
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__delslice__', '__doc__', '__eq__', '__ge__', '__getattribute__', '__getitem__', '__getslice__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__setslice__', '__str__', 'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
>>> links[0].url

So… “Link” is the instance of the link object? Which I thought was a list, and type() tells me it’s a list, but it has no __getitem__? Clearly there is something here with instances and objects and types that I’m just not getting. The solution turned out to be fairly simple, and even elegant, for that matter:


>>> links[0].url
'http://www.rfc-editor.org/rfc/rfc2606.txt'

But I confess I’m still at a bit of a loss as to why that’s the answer instead of something else.

Tags: ,

RSS feed | Trackback URI

4 Comments »

Comment by nat
2008-04-04 10:27:16

follow() doesn’t just use the name of the link, you can also give it the link text, or the url, hell, even just part of the url. really anything that distinguishes it.

Comment by pam
2008-04-04 12:12:44

Hmmm. I think my problem is that showlinks() gives me something like this:

1. None ==> /20484

Where the number is dynamically generated. I am not sure how to follow that link.

 
 
Comment by nat
2008-04-04 13:26:48

follow(list(links)[1].url) perhaps?

 
Comment by ars
2008-04-04 21:02:01

Generators are a nice and easy way to create iterators. Here’s a silly example:


>>> def gen():
... yield 1
... yield 2
... yield 3
...
>>> type(gen)

>>> g = gen()
>>> type(g)

>>> g.next()
1
>>> g.next()
2
>>> g.next()
3
>>> g.next()
Traceback (most recent call last):
File "", line 1, in
StopIteration

Basically, gen is just a function. It’s the yield statements that let python know it’s a generator. g is the actual generator object created from calling gen().

Calling g.next() the first time returns the first value that’s “yielded”. The second time, the function just picks up where it left off, so g.next() returns 2, etc. Until it drops off the end, at which point it raises a StopIteration exception. This is kind how iterators work behind the scenes. Generators make it easier to write iterators without defining a class and the next, iter methods.

You don’t usually use a generator in that way. Instead you would use it like an iterator, like this:


>>> for num in gen():
... print num
...
1
2
3

Anyway, to your post … showlinks() returns a generator object, and you could call next() or just iterate through it with a for loop.

When you create a list (links = list(showlinks())), it’s basically iterated through the entire generator and stuffed the values in that list.

Each element of the list is actually a an instance of the Link class from the mechanize package (twill is using mechanize for its browsing facilities):


>>> link = links[1]
>>> link
Link(base_url='http://del.icio.us', url='https://secure.del.icio.us/login', text='login',
tag='a', attrs=[('href', 'https://secure.del.icio.us/login')])
>>> type(link)

>>> link.__class__

>>> dir(link)
['__cmp__', '__doc__', '__init__', '__module__', '__repr__', 'absolute_url', 'attrs', 'bas
e_url', 'tag', 'text', 'url']
>>> link.url
'https://secure.del.icio.us/login'
>>> link.text
'login'
>>> link.absolute_url
'https://secure.del.icio.us/login'
>>> link.tag
'a'

Hope that makes sense.

 
Name (required)
E-mail (required - never shown publicly)
URI
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong> in your comment.

Trackback responses to this post