[Planetlab-devel] inter-plc federation -
email uniqueness - proposed patch & discussion
Thierry Parmentelat
Thierry.Parmentelat at sophia.inria.fr
Thu Jun 14 04:07:12 EDT 2007
Following up on this topic:
I have been testing this change on both a real deployment -- the one
that runs the peering trial with the public PLC -- and in vitro -- on a
dedicated couple of myplcs that run the TestPeers.py code -- and have
not seen any drawback with this patch.
I would appreciate an update from Princeton about your position on this
proposal, and I would be grateful if this change could make its way
until your production code at some point, so I could relax the
constraint on email for our users (so far we have been using fake
aliases in the @one-lab.org domain).
-- Thierry
Thierry Parmentelat wrote:
> Hi Tony
>
> I think I have a working version for this email-namespace problem
> Please find below the patch I am suggesting -- I have not mentioned
> the change in Nodes.py (about validate_last_contact) since I think you
> already merged it in the princeton codebase.
>
> The only additional trick is to artificially update the instances of
> persons as returned from GetPeerData. More specifically, GetPeerData
> returns person dicts where peer_id is set to None (since we've asked
> for the peering end's local persons), so I just need to set it to the
> local peer_id that depicts the peering end, and use the idea that I
> described last week, about having validate_email check for duplicates
> on the candidate person's peer_id only.
>
> ==
> As I said I am now going to review GetPeerData, and try and narrow the
> scope of the fields that get cached for federation. For instance with
> the current code, and since the introduction of last_contact, I have
> almost all nodes that get refreshed during two successive calls to
> RefreshPeer; I suspect this rather unfriendly behaviour to be due to
> last_contact being changed frequently (how often is that supposed to
> be, btw ?)
> I hope this review can also drastically improve the overall
> performance of RefreshPeer -- at least by reducing the size of the xml
> traffic -- but to that end I am back to a decent federation test setup
> so this will take a while.
> So more on this later.
> ==
>
> I've been thinking about the possible implications of this change on
> the web pages, but as far as I can tell at this stage, there should
> not be any in fact.
>
> -- Thierry
>
> $ svn diff
> Index: PLC/Persons.py
> ===================================================================
> --- PLC/Persons.py (revision 492)
> +++ PLC/Persons.py (working copy)
> @@ -100,7 +100,13 @@
> if len(domain) < 2:
> raise invalid_email
>
> - conflicts = Persons(self.api, [email])
> + # Thierry - June 11 2007
> + # check only against users on the same peer
> + if 'peer_id' in self:
> + namespace_peer_id = self['peer_id']
> + else:
> + namespace_peer_id = None
> + conflicts = Persons(self.api,
> {'email':email,'peer_id':namespace_peer_id})
> for person in conflicts:
> if 'person_id' not in self or self['person_id'] !=
> person['person_id']:
> raise PLCInvalidArgument, "E-mail address already in use"
> Index: PLC/Methods/RefreshPeer.py
> ===================================================================
> --- PLC/Methods/RefreshPeer.py (revision 492)
> +++ PLC/Methods/RefreshPeer.py (working copy)
> @@ -228,6 +228,10 @@
>
> # Keyed on foreign person_id
> old_peer_persons = Persons(self.api, {'peer_id': peer_id},
> columns).dict('peer_person_id')
> + # artificially attach the persons returned by GetPeerData to
> the new peer
> + # this is because validate_email needs peer_id to be correct
> when checking for duplicates
> + for person in peer_tables['Persons']:
> + person['peer_id']=peer_id
> persons_at_peer = dict([(peer_person['person_id'], peer_person) \
> for peer_person in
> peer_tables['Persons']])
>
>
>
> Tony Mack wrote:
>> Hi Thierry,
>>
>> It seems this issue is a little tricky, but I think you are headed in
>> the right direction. I will be bringing up an alternate plc soon,
>> which will act as a test environment for new development. We can use
>> this alternate plc to help us test these federation issues.
>>
>> BTW, I will update Nodes.py to validate last_contact, thanks for
>> pointing this out.
>>
>> -- Tony
>>
>> Thierry Parmentelat wrote:
>>> Tony
>>> I did some progress on this topic:
>>> - the patch I sent does not work as is because as I suspected there
>>> are cases where 'peer_id' is not provided at all to validate_email
>>> - the protected version where we filter against peer_id when it is
>>> provided, or against peer_id=None when it's not, is half good. With
>>> this version I can locally add an e-mail address that is already
>>> present from another peer
>>> - but I cannot successfully cache a remote email that would have
>>> already locally registered. This is much more tricky because
>>> RefreshPeer turns out to call sync on a Person object that has no
>>> peer_id set yet.
>>>
>>> So please do not waste too much time at this stage on 'how' to
>>> implement this, I'll have to spend more time on this next week.
>>> But I'd still like your inputs on how this should be tested, because
>>> it clearly could have weird side-effects.
>>>
>>>
>>> In the meanwhile, there definitely is something that you might
>>> consider doing on your side, in Nodes.py
>>>
>>> validate_date_created = Row.validate_timestamp
>>> validate_last_updated = Row.validate_timestamp
>>> + validate_last_contact = Row.validate_timestamp
>>>
>>> def update_last_contact(self, commit = True):
>>>
>>> which allows the last_contact field that you recently added, to get
>>> correctly refreshed from the integer-date we get through GetpeerData
>>> and xmlrpc.
>>>
>>> By the way, I also plan on reviewing the set of data that flows
>>> through GetPeerData.
>>> It turns out this last_contact field gets cached togother with a lot
>>> more data that is definitely not crucial to federation, and
>>> I am pretty sure I can substantially improve the overall
>>> performance. This is another story, but until I do this review
>>> you'll need this patch so RefreshPeer can work on your side.
>>>
>>> -- Thierry
>>>
>>>
>>> Thierry Parmentelat wrote:
>>>> Hi Tony and all
>>>>
>>>> I am currently trying to address the following issue.
>>>> The current status of the api & RefreshPeer thing has a major
>>>> drawback. During the transitional period where European users will
>>>> start switching from the Public PlanetLab at PLC to
>>>> PlanetLabEurope, there may be a need for a user to be registered at
>>>> both myplc's at the same time.
>>>> In this particular case, the current software does not allow the
>>>> user to use the same email address on both ends. The reasons for
>>>> this are that
>>>> - AddPerson uses validate_email in Persons.py, that in turn checks
>>>> for any Person with that new email address and raises an exception
>>>> if such an entry can be found
>>>> - RefreshPeer basically uses the same validate_email code when it
>>>> tries to sync newly created local objects
>>>> - Note however that no hard-wired constraints in the DB is enabled
>>>> here
>>>>
>>>>
>>>> I am currently validating the following patch in PLC/Persons.py
>>>> if len(domain) < 2:
>>>> raise invalid_email
>>>>
>>>> - conflicts = Persons(self.api, [email])
>>>> + # Thierry - June 8 2007
>>>> + # check only against users on the same peer
>>>> + # xxx should we protect against cases where self does not
>>>> hold a 'peer_id' key ?
>>>> + conflicts = Persons(self.api,
>>>> {'email':email,'peer_id':self['peer_id']})
>>>> for person in conflicts:
>>>> if 'person_id' not in self or self['person_id'] !=
>>>> person['person_id']:
>>>>
>>>> As you see I just alter the uniqueness check by relaxing the
>>>> constraint, and prevent email duplication on a given peer only.
>>>>
>>>> I am having a hard time setting up a reasonable test environment to
>>>> check this feature as thoroughly as possible, so I cannot commit on
>>>> its correctness yet.
>>>> I would appreciate to know your feelings about it anyway, that
>>>> might save me some time especially if this is a wrong track. So:
>>>> - first I'd like to know whether you feel comfortable with that
>>>> patch and hear your comments if any. In particular do you think
>>>> there might be cases where validate_email can be called on a 'self'
>>>> that would not hold a 'peer_id' filed ?
>>>> - and second, during my test so far I've noticed that this same
>>>> validate_email thing was also called when performing deletions. I
>>>> assume this applies to all types of objects. My feeling is that
>>>> this sounds awkward - for instance if one changes the validation
>>>> for a given entity to make it more strict, then injecting the new
>>>> code will prevent us from cleanly deleting entities that do not
>>>> fulfill the validation criteria - which are precisely the ones we'd
>>>> like to delete. This is of course much independant from the central
>>>> issue I am addressing here, but your comments about this would be
>>>> appreciated too.
>>>>
>>>>
>>>> Thanks -- Thierry
>>>>
>>>> _______________________________________________
>>>> Devel mailing list
>>>> Devel at lists.planet-lab.org
>>>> https://lists.planet-lab.org/mailman/listinfo/devel
>>>
>>> _______________________________________________
>>> Devel mailing list
>>> Devel at lists.planet-lab.org
>>> https://lists.planet-lab.org/mailman/listinfo/devel
>>
>> _______________________________________________
>> Devel mailing list
>> Devel at lists.planet-lab.org
>> https://lists.planet-lab.org/mailman/listinfo/devel
>
> _______________________________________________
> Devel mailing list
> Devel at lists.planet-lab.org
> https://lists.planet-lab.org/mailman/listinfo/devel
More information about the Devel
mailing list