Mailman 3 December 1999 - Python-Dev

Better text processing support in py2k?
by Skip Montanaro Jan. 3, 2000

Jan. 3, 2000

It just occurred to me as I was replying to a request on the main list, that Python's text handling capabilities could be a bit better than they are. This will probably not come as a revelation to many of you, but I finally put it together with the standard argument against beefing things up One fix would be to add regular expressions to the language core and have special syntax for them, as Perl has done. However, I don't like this solution because Python is a general-purpose language, and regular expressions are used for the single application domain of text processing. For other application domains, regular expressions may be of no interest, and you might want to remove them to save memory and code size. and the observation that Python does support some builtin objects and syntax that are fairly specific to some much more restricted application domains than text processing. I stole the above quote from Andrew Kuchling's Python Warts page, which I also happened to read earlier today. What AMK says makes perfect sense until you examine some of the other things that are in the language, like the Ellipsis object and complex numbers. If I recall correctly both were added as a result of the NumPy package development. I have nothing against ellipses or complex numbers. They are fine first class objects that should remain in the language. But I have never used either one in my day-to-day work. On the other hand, I read files and manipulate them with regular expressions all the time. I rather suspect that more people use Python for some sort of text processing than any other single application domain. Python should be good at it. While I don't want to turn Python into Perl, I would like to see it do a better job of what most people probably use the language for. Here is a very short list of things I think need attention: 1. When using something like the simple file i/o idiom for line in f.readlines(): dofunstuff(line) the programmer should not have to care how big the file is. It should just work in a reasonably efficient manner without gobbling up all of memory. I realize this may require some change to the syntax of the common idiom. 2. The re module needs to be sped up, if not to catch up with Perl, then to catch up with the deprecated regex module. Depending how far people want to go with things, adding some language syntax to support regular expressions might be in order. I don't see that as compelling as adding complex numbers however. Another possibility, now that Barry Warsaw has opened the floodgates, is to add regular expression methods to strings. 3. I've not yet used it, but I am told the pattern matching in Marc-Andre Lemburg's mxTextTools (http://starship.python.net/crew/lemburg/) is both powerful and efficient (though it certainly appears complex). Perhaps it deserves consideration for incorporation into the core Python distribution. I'm sure other people will come up with other suggestions. Skip Montanaro | http://www.mojam.com/ skip(a)mojam.com | http://www.musi-cal.com/ 847-971-7098 | Python: Programming the way Guido indented...

5 10

Re: [Distutils] Questions about distutils strategy
by Guido van Rossum Jan. 2, 2000

Jan. 2, 2000

[Great analysis, Tim!] > 4) The audience is Python end-users "in general", and the product is pure > Python. I think this is the most important one for Distutils to address, > and compilation isn't a part of it. So far, though, what Gordon is doing > seems more appropriate than what Distutils has been up to. I hope his work > gets folded into this. I'm not sure what stuff by which Gordon you're referring to. I am only familiar with his installer, which I thought is win32 only (but I may be mistaken) and is an installer for a whole application, not just a bunch of modules. Please correct me if I'm wrong. But this reminds me of a different issue, which Jim Ahlstrom has been hammering about before: there's a completely separate set of cases where what you are distributing is a stand-alone application, and the target consists of end users who are entirely uninterested in whether it's written in Python, C or Elvish. (And then there's still the distinction between Win32, Unix or both.) The current distutil dools don't deal with this at all. I think it should though, and I think its framework is powerful enough to be able to add this, e.g. as a new "appdist" command. --Guido van Rossum (home page: http://www.python.org/~guido/)

14 78

Fixed-decimal types
by Jim Fulton Dec. 30, 1999

Dec. 30, 1999

While on the subject of RDBMS systems, a common need is to be able to work with fixed-decimal data. I think a standard Python fixed-decimal type would help to make Python database interfaces alot more robust. I even wonder if the Python long type might be hijacked for this purpose by adding a "scale" that indicates the number of digits to the right of the decimal point. For example, an expression like: 1000000000.2500L would create a fixed decimal number with a scale of 4. People have built Python classes for fixed-decimal types, but when working with RDBMS data, one often deals with lots of data and efficiency matters. I also suspect that adding scale to longs wouldn't be that hard and would be a fairly natural extension. In any case, a "standard" (being in the standard library would be sufficient) fixed-decimal type would probably lead to better database interfaces that (at least more) properly handled fixed-decimal data. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats.

3 5

Re: [Python-Dev] Better text processing support in py2k?
by Andy Robinson Dec. 29, 1999

Dec. 29, 1999

--- Skip Montanaro <skip(a)mojam.com> wrote: > fast/memory-intensive/clear > slow/memory-conserving/not-as-clear > fast/memory-conserving/fairly-muddy > > Any particular reason that the readline method can't > return an iterator that > supports __getitem__ and buffers input? (Again, > remember this is for py2k, > so the potential breakage such a change might cause > is a consideration, but > not a showstopper.) Why not generalize fileinput to do buffering instead? More generally, Java has the notion of 'stackable streams' - e.g. construct a 'BufferedFile' around a 'File', maybe construct a 'Line-oriented file' around that etc. Each one takes a file-like object as an argument to the constructor. Things you might want to do: - buffering - international encoding conversions - line delimiters other than CR/LF/CRLF - read/write Python objects (i.e. use pickle/marshal) - easy interfaces to parsers This took me a couple of hours to get used to (and at the time I thought 'Yuk!' when I saw first saw four nested constructors), but gives you very precise control and a lot of versatility when handling files. It's an idiom Python does not use much but maybe it should. I'd argue that maybe some enhancements to fileinput.py - adding some streams to provide building blocks for these operations - would get us the power you want and a lot more versatility besides. ===== Andy Robinson Robinson Analytics Ltd. ------------------ My opinions are the official policy of Robinson Analytics Ltd. They just vary from day to day. __________________________________________________ Do You Yahoo!? Talk to your friends online with Yahoo! Messenger. http://messenger.yahoo.com

2 1

Please test new dynamic load behavior
by Greg Stein Dec. 27, 1999

Dec. 27, 1999

Hi all, I reorganized Python's dynamic load/import code over the past few days. Gudio provided some feedback, I did some more mods, and now it is checked into CVS. The new loading behavior has been tested on Linux, IRIX, and Solaris (and probably Windows by now). For people with CVS access, I'd like to ask that you grab an updated copy and shake out the new code. There have been updates to the "configure" process, so you'll need to run configure again. Make sure that you alter your Modules/Setup to build some shared modules, and then try it out. Here are some of the platforms that I believe need specific testing: - NetBSD, FreeBSD, OpenBSD, ... - AIX - HP/UX - BeOS - NeXT - Mac - OS/2 - Win16 I believe it should work for most people, but we may be looking for the wrong "init<module>" symbol on some platforms. We might even be selecting the wrong import mechanism (or missing it altogether!) on some platforms. If you get a chance to test this, then please drop me a note with your platform and whether it succeeded or failed (and how it failed). Thanx! -g p.s. you can tell if dynamic loading is missing by watching for DYNLOADFILE in the configure process and seeing if it used dynload_stub. alternatively, you can import the "imp" module and see if "load_dynamic" is missing. -- Greg Stein, http://www.lyra.org/

4 4

str(1L) -> '1' ?
by Jim Fulton Dec. 27, 1999

Dec. 27, 1999

In November there was an interesting discussion on comp.lang.python about the meaning of __str__ and __repr__. One tidbit that came out of this discussion was that __str__ for longs should drop the trailing 'L'. Was there a decision on this? I'd really like this to happen. We do alot of work with RDBMS systems and long integers seem to come up alot with these systems (as do other fix-decimal number, but that's another topic ;). For example, our latest Sybase and Oracle support in Zope returns long integers for RDBMS types like NUMBER(10,0). The trailing 'L' in the string representation is causeing us some headaches. This seems also to be an issue when using the current standard ODBC interface with Oracle, as indicated in a DB-SIG post today. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats.

5 7

Re: [Python-Dev] Fixed Decimal types
by Jack Jansen Dec. 27, 1999

Dec. 27, 1999

> (1) I think there is great mileage in combining the > fixed-decimal concept with Martin Fowler's Quantity > pattern, so that a variable could be defined as not > just two decimal places but also (say) "GBP" or "USD", > and it would be an error to add the two. Same applies > for adding metres, kilograms and other quantities. > There has also been discussion that the 'type' of a > quantity should determine what math should apply. Isn't this something that is ideally suited for implementation in a Python module, based on a core implementation of fixed decimal numbers? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen(a)oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm

3 2

Re: [Python-Dev] Fixed Decimal types
by Andy Robinson Dec. 24, 1999

Dec. 24, 1999

> >> However, unlike RPG, we should probably ensure > >> that attempts to overflow or underflow the scale > >> result in NaN or Overflow conditions, rather > >> than assuming the user is right and losing > >> the significant digits. > > > Since this would be based on infinite-precision > numbers, I don't > > think that this would be an issue. Three very general observations before I disappear for Christmas: (1) I think there is great mileage in combining the fixed-decimal concept with Martin Fowler's Quantity pattern, so that a variable could be defined as not just two decimal places but also (say) "GBP" or "USD", and it would be an error to add the two. Same applies for adding metres, kilograms and other quantities. There has also been discussion that the 'type' of a quantity should determine what math should apply. (2) If Python is going to be used increasingly in eCommerce, it should be good at dealing with money - maybe not in the core language, but we should aim for one standard package. (3) We have a python-finance list (python-finance(a)egroups.com), recently generalized to cover business systems, which is a good place to discuss this if anyone wants to. There are people there who have time, would love to prototype something (indeed some work started in this area 3 months back), and would use it at work too. This would be an ideal first target for that group - or indeed for a finance-sig. I'll pursue this in the New Year. Merry Christmas, Andy ===== Andy Robinson Robinson Analytics Ltd. ------------------ My opinions are the official policy of Robinson Analytics Ltd. They just vary from day to day. _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com

1 0

Re: [Python-checkins] CVS: python/dist/src/Lib getopt.py,1.7,1.8
by Fred L. Drake, Jr. Dec. 24, 1999

Dec. 24, 1999

Guido van Rossum writes: > + > + class GetoptError(Exception): > + opt = '' > + msg = '' > + def __init__(self, *args): > + self.args = args > + if len(args) == 1: > + self.msg = args[0] > + elif len(args) == 2: > + self.msg = args[0] > + self.opt = args[1] > + > + def __str__(self): > + return self.msg > > ! error = GetoptError # backward compatibility This breaks as soon as the standard exceptions are strings; does this mean -X will be removed in the next release? (Please????) -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> Corporation for National Research Initiatives

10 24

Re: [Python-Dev] Date and timetypes (was: Fixed-decimal types)
by Andy Robinson Dec. 24, 1999

Dec. 24, 1999

Sorry, should have replied to the list... --- Andy Robinson <captainrobbo(a)yahoo.com> wrote: > Date: Thu, 23 Dec 1999 08:37:18 -0800 (PST) > From: Andy Robinson <captainrobbo(a)yahoo.com> > Reply-to: andy(a)robanal.demon.co.uk > Subject: Re: [Python-Dev] Date and timetypes (was: > Fixed-decimal types) > To: Guido van Rossum <guido(a)CNRI.Reston.VA.US> > > --- Guido van Rossum <guido(a)CNRI.Reston.VA.US> > wrote: > > I don't know much about date/time types, or about > > mxDateTime. > > My intuition is that there are too many ways to do > > it, and that being > > compatible with commercial databases may not be > the > > right way to do it > > for core Python. > > > > OK. Let me rephrase it. Say we form a consensus on > 'the right way'. Are you amenable to some solution > which goes back before 1970 and after 2038 going > into > the standard library? > > And does your answer change if it involves some > compiled code as well? > > I mention mxDateTime because it was agreed by a > Python > SIG, is mature and stable, and I find it very > useful. > And the core type is pretty small - much of the > helper > stuff in the package now could be kept separate from > the main Python distribution. > > - Andy > > > ===== > Andy Robinson > Robinson Analytics Ltd. > ------------------ > My opinions are the official policy of Robinson > Analytics Ltd. > They just vary from day to day. > > _________________________________________________________ > Do You Yahoo!? > Get your free @yahoo.com address at > http://mail.yahoo.com > ===== Andy Robinson Robinson Analytics Ltd. ------------------ My opinions are the official policy of Robinson Analytics Ltd. They just vary from day to day. _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com

3 2