[Planetlab-devel] inter-plc federation - email uniqueness - proposed patch & discussion

Thierry Parmentelat Thierry.Parmentelat at sophia.inria.fr
Thu Jun 14 04:07:12 EDT 2007


Following up on this topic:
I have been testing this change on both a real deployment -- the one 
that runs the peering trial with the public PLC -- and in vitro -- on a 
dedicated couple of myplcs that run the TestPeers.py code -- and have 
not seen any drawback with this patch.

I would appreciate an update from Princeton about your position on this 
proposal, and I would be grateful if this change could make its way 
until your production code at some point, so I could relax the 
constraint on email for our users (so far we have been using fake 
aliases in the @one-lab.org domain).

-- Thierry


Thierry Parmentelat wrote:
> Hi Tony
>
> I think I have a working version for this email-namespace problem
> Please find below the patch I am suggesting -- I have not mentioned 
> the change in Nodes.py (about validate_last_contact) since I think you 
> already merged it in the princeton codebase.
>
> The only additional trick is to artificially update the instances of 
> persons as returned from GetPeerData. More specifically, GetPeerData 
> returns person dicts where peer_id is set to None (since we've asked 
> for the peering end's local persons), so I just need to set it to the 
> local peer_id that depicts the peering end, and use the idea that I 
> described last week, about having validate_email check for duplicates 
> on the candidate person's peer_id only.
>
> ==
> As I said I am now going to review GetPeerData, and try and narrow the 
> scope of the fields that get cached for federation. For instance with 
> the current code, and since the introduction of last_contact, I have 
> almost all nodes that get refreshed during two successive calls to 
> RefreshPeer; I suspect this rather unfriendly behaviour to be due to 
> last_contact being changed frequently (how often is that supposed to 
> be, btw ?)
> I hope this review can also drastically improve the overall 
> performance of RefreshPeer -- at least by reducing the size of the xml 
> traffic -- but to that end I am back to a decent federation test setup 
> so this will take a while.
> So more on this later.
> ==
>
> I've been thinking about the possible implications of this change on 
> the web pages, but as far as I can tell at this stage, there should 
> not be any in fact.
>
> -- Thierry
>
> $ svn diff
> Index: PLC/Persons.py
> ===================================================================
> --- PLC/Persons.py      (revision 492)
> +++ PLC/Persons.py      (working copy)
> @@ -100,7 +100,13 @@
>         if len(domain) < 2:
>             raise invalid_email
>
> -        conflicts = Persons(self.api, [email])
> +       # Thierry - June 11 2007
> +       # check only against users on the same peer
> +       if 'peer_id' in self:
> +           namespace_peer_id = self['peer_id']
> +       else:
> +           namespace_peer_id = None
> +        conflicts = Persons(self.api, 
> {'email':email,'peer_id':namespace_peer_id})
>         for person in conflicts:
>             if 'person_id' not in self or self['person_id'] != 
> person['person_id']:
>                 raise PLCInvalidArgument, "E-mail address already in use"
> Index: PLC/Methods/RefreshPeer.py
> ===================================================================
> --- PLC/Methods/RefreshPeer.py  (revision 492)
> +++ PLC/Methods/RefreshPeer.py  (working copy)
> @@ -228,6 +228,10 @@
>
>         # Keyed on foreign person_id
>         old_peer_persons = Persons(self.api, {'peer_id': peer_id}, 
> columns).dict('peer_person_id')
> +       # artificially attach the persons returned by GetPeerData to 
> the new peer
> +       # this is because validate_email needs peer_id to be correct 
> when checking for duplicates
> +       for person in peer_tables['Persons']:
> +           person['peer_id']=peer_id
>         persons_at_peer = dict([(peer_person['person_id'], peer_person) \
>                                 for peer_person in 
> peer_tables['Persons']])
>
>
>
> Tony Mack wrote:
>> Hi Thierry,
>>
>> It seems this issue is a little tricky, but I think you are headed in 
>> the right direction. I will be bringing up an alternate plc soon, 
>> which will act as a test environment for new development. We can use 
>> this alternate plc to help us test these federation issues.
>>
>> BTW, I will update Nodes.py to validate last_contact, thanks for 
>> pointing this out.
>>
>> -- Tony
>>
>> Thierry Parmentelat wrote:
>>> Tony
>>> I did some progress on this topic:
>>> - the patch I sent does not work as is because as I suspected there 
>>> are cases where 'peer_id' is not provided at all to validate_email
>>> - the protected version where we filter against peer_id when it is 
>>> provided, or against peer_id=None when it's not, is half good. With 
>>> this version I can locally add an e-mail address that is already 
>>> present from another peer
>>> - but I cannot successfully cache a remote email that would have 
>>> already locally registered. This is much more tricky because 
>>> RefreshPeer turns out to call sync on a Person object that has no 
>>> peer_id set yet.
>>>
>>> So please do not waste too much time at this stage on 'how' to 
>>> implement this, I'll have to spend more time on this next week.
>>> But I'd still like your inputs on how this should be tested, because 
>>> it clearly could have weird side-effects.
>>>
>>>
>>> In the meanwhile, there definitely is something that you might 
>>> consider doing on your side, in Nodes.py
>>>
>>>     validate_date_created = Row.validate_timestamp
>>>     validate_last_updated = Row.validate_timestamp
>>> +    validate_last_contact = Row.validate_timestamp
>>>
>>>     def update_last_contact(self, commit = True):
>>>
>>> which allows the last_contact field that you recently added, to get 
>>> correctly refreshed from the integer-date we get through GetpeerData 
>>> and xmlrpc.
>>>
>>> By the way, I also plan on reviewing the set of data that flows 
>>> through GetPeerData.
>>> It turns out this last_contact field gets cached togother with a lot 
>>> more data that is definitely not crucial to federation, and
>>> I am pretty sure I can substantially improve the overall 
>>> performance. This is another story, but until I do this review 
>>> you'll need this patch so RefreshPeer can work on your side.
>>>
>>> -- Thierry
>>>
>>>
>>> Thierry Parmentelat wrote:
>>>> Hi Tony and all
>>>>
>>>> I am currently trying to address the following issue.
>>>> The current status of the api & RefreshPeer thing has a major 
>>>> drawback. During the transitional period where European users will 
>>>> start switching from the Public PlanetLab at PLC to 
>>>> PlanetLabEurope, there may be a need for a user to be registered at 
>>>> both myplc's at the same time.
>>>> In this particular case, the current software does not allow the 
>>>> user to use the same email address on both ends. The reasons for 
>>>> this are that
>>>> - AddPerson uses validate_email in Persons.py, that in turn checks 
>>>> for any Person with that new email address and raises an exception 
>>>> if such an entry can be found
>>>> - RefreshPeer basically uses the same validate_email code when it 
>>>> tries to sync newly created local objects
>>>> -  Note however that no hard-wired constraints in the DB is enabled 
>>>> here
>>>>
>>>>
>>>> I am currently validating the following patch in PLC/Persons.py
>>>>         if len(domain) < 2:
>>>>             raise invalid_email
>>>>
>>>> -        conflicts = Persons(self.api, [email])
>>>> +       # Thierry - June 8 2007
>>>> +       # check only against users on the same peer
>>>> +       # xxx should we protect against cases where self does not 
>>>> hold a 'peer_id' key ?
>>>> +        conflicts = Persons(self.api, 
>>>> {'email':email,'peer_id':self['peer_id']})
>>>>         for person in conflicts:
>>>>             if 'person_id' not in self or self['person_id'] != 
>>>> person['person_id']:
>>>>
>>>> As you see I just alter the uniqueness check by relaxing the 
>>>> constraint, and prevent email duplication on a given peer only.
>>>>
>>>> I am having a hard time setting up a reasonable test environment to 
>>>> check this feature as thoroughly as possible, so I cannot commit on 
>>>> its correctness yet.
>>>> I would appreciate to know your feelings about it anyway, that 
>>>> might save me some time especially if this is a wrong track. So:
>>>> - first I'd like to know whether you feel comfortable with that 
>>>> patch and hear your comments if any. In particular do you think 
>>>> there might be cases where validate_email can be called on a 'self' 
>>>> that would not hold a 'peer_id' filed ?
>>>> - and second, during my test so far I've noticed that this same 
>>>> validate_email thing was also called when performing deletions. I 
>>>> assume this applies to all types of objects. My feeling is that 
>>>> this sounds awkward - for instance if one changes the validation 
>>>> for a given entity to make it more strict, then injecting the new 
>>>> code will prevent us from cleanly deleting entities that do not 
>>>> fulfill the validation criteria - which are precisely the ones we'd 
>>>> like to delete. This is of course much independant from the central 
>>>> issue I am addressing here, but your comments about this would be 
>>>> appreciated too.
>>>>
>>>>
>>>> Thanks -- Thierry
>>>>
>>>> _______________________________________________
>>>> Devel mailing list
>>>> Devel at lists.planet-lab.org
>>>> https://lists.planet-lab.org/mailman/listinfo/devel
>>>
>>> _______________________________________________
>>> Devel mailing list
>>> Devel at lists.planet-lab.org
>>> https://lists.planet-lab.org/mailman/listinfo/devel
>>
>> _______________________________________________
>> Devel mailing list
>> Devel at lists.planet-lab.org
>> https://lists.planet-lab.org/mailman/listinfo/devel
>
> _______________________________________________
> Devel mailing list
> Devel at lists.planet-lab.org
> https://lists.planet-lab.org/mailman/listinfo/devel



More information about the Devel mailing list