It just occurred to me as I was replying to a request on the main list, that
Python's text handling capabilities could be a bit better than they are.
This will probably not come as a revelation to many of you, but I finally
put it together with the standard argument against beefing things up
One fix would be to add regular expressions to the language core and
have special syntax for them, as Perl has done. However, I don't like
this solution because Python is a general-purpose language, and regular
expressions are used for the single application domain of text
processing. For other application domains, regular expressions may be of
no interest, and you might want to remove them to save memory and code
size.
and the observation that Python does support some builtin objects and syntax
that are fairly specific to some much more restricted application domains
than text processing.
I stole the above quote from Andrew Kuchling's Python Warts page, which I
also happened to read earlier today.
What AMK says makes perfect sense until you examine some of the other things
that are in the language, like the Ellipsis object and complex numbers. If
I recall correctly both were added as a result of the NumPy package
development.
I have nothing against ellipses or complex numbers. They are fine first
class objects that should remain in the language. But I have never used
either one in my day-to-day work. On the other hand, I read files and
manipulate them with regular expressions all the time. I rather suspect
that more people use Python for some sort of text processing than any other
single application domain. Python should be good at it.
While I don't want to turn Python into Perl, I would like to see it do a
better job of what most people probably use the language for. Here is a
very short list of things I think need attention:
1. When using something like the simple file i/o idiom
for line in f.readlines():
dofunstuff(line)
the programmer should not have to care how big the file is. It
should just work in a reasonably efficient manner without gobbling up
all of memory. I realize this may require some change to the syntax
of the common idiom.
2. The re module needs to be sped up, if not to catch up with Perl, then
to catch up with the deprecated regex module. Depending how far
people want to go with things, adding some language syntax to support
regular expressions might be in order. I don't see that as
compelling as adding complex numbers however. Another possibility,
now that Barry Warsaw has opened the floodgates, is to add regular
expression methods to strings.
3. I've not yet used it, but I am told the pattern matching in
Marc-Andre Lemburg's mxTextTools
(http://starship.python.net/crew/lemburg/) is both powerful and
efficient (though it certainly appears complex). Perhaps it deserves
consideration for incorporation into the core Python distribution.
I'm sure other people will come up with other suggestions.
Skip Montanaro | http://www.mojam.com/
skip(a)mojam.com | http://www.musi-cal.com/
847-971-7098 | Python: Programming the way Guido indented...
[Great analysis, Tim!]
> 4) The audience is Python end-users "in general", and the product is pure
> Python. I think this is the most important one for Distutils to address,
> and compilation isn't a part of it. So far, though, what Gordon is doing
> seems more appropriate than what Distutils has been up to. I hope his work
> gets folded into this.
I'm not sure what stuff by which Gordon you're referring to. I am
only familiar with his installer, which I thought is win32 only (but
I may be mistaken) and is an installer for a whole application, not
just a bunch of modules. Please correct me if I'm wrong.
But this reminds me of a different issue, which Jim Ahlstrom has been
hammering about before: there's a completely separate set of cases
where what you are distributing is a stand-alone application, and the
target consists of end users who are entirely uninterested in whether
it's written in Python, C or Elvish. (And then there's still the
distinction between Win32, Unix or both.) The current distutil dools
don't deal with this at all. I think it should though, and I think
its framework is powerful enough to be able to add this, e.g. as a new
"appdist" command.
--Guido van Rossum (home page: http://www.python.org/~guido/)
While on the subject of RDBMS systems, a common need is to be able to
work with fixed-decimal data. I think a standard Python fixed-decimal
type would help to make Python database interfaces alot more robust.
I even wonder if the Python long type might be hijacked for this purpose
by adding a "scale" that indicates the number of digits to the right
of the decimal point. For example, an expression like:
1000000000.2500L
would create a fixed decimal number with a scale of 4.
People have built Python classes for fixed-decimal
types, but when working with RDBMS data, one often deals with
lots of data and efficiency matters. I also suspect that adding
scale to longs wouldn't be that hard and would be a fairly natural
extension.
In any case, a "standard" (being in the standard library would
be sufficient) fixed-decimal type would probably lead to better
database interfaces that (at least more) properly handled
fixed-decimal data.
Jim
--
Jim Fulton mailto:jim@digicool.com Python Powered!
Technical Director (888) 344-4332 http://www.python.org
Digital Creations http://www.digicool.comhttp://www.zope.org
Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission. Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.
--- Skip Montanaro <skip(a)mojam.com> wrote:
> fast/memory-intensive/clear
> slow/memory-conserving/not-as-clear
> fast/memory-conserving/fairly-muddy
>
> Any particular reason that the readline method can't
> return an iterator that
> supports __getitem__ and buffers input? (Again,
> remember this is for py2k,
> so the potential breakage such a change might cause
> is a consideration, but
> not a showstopper.)
Why not generalize fileinput to do buffering instead?
More generally, Java has the notion of 'stackable
streams' - e.g. construct a 'BufferedFile' around a
'File', maybe construct a 'Line-oriented file' around
that etc. Each one takes a file-like object as an
argument to the constructor. Things you might want to
do:
- buffering
- international encoding conversions
- line delimiters other than CR/LF/CRLF
- read/write Python objects (i.e. use pickle/marshal)
- easy interfaces to parsers
This took me a couple of hours to get used to (and at
the time I thought 'Yuk!' when I saw first saw four
nested constructors), but gives you very precise
control and a lot of versatility when handling files.
It's an idiom Python does not use much but maybe it
should.
I'd argue that maybe some enhancements to fileinput.py
- adding some streams to provide building blocks for
these operations - would get us the power you want and
a lot more versatility besides.
=====
Andy Robinson
Robinson Analytics Ltd.
------------------
My opinions are the official policy of Robinson Analytics Ltd.
They just vary from day to day.
__________________________________________________
Do You Yahoo!?
Talk to your friends online with Yahoo! Messenger.
http://messenger.yahoo.com
Hi all,
I reorganized Python's dynamic load/import code over the past few days.
Gudio provided some feedback, I did some more mods, and now it is checked
into CVS. The new loading behavior has been tested on Linux, IRIX, and
Solaris (and probably Windows by now).
For people with CVS access, I'd like to ask that you grab an updated copy
and shake out the new code. There have been updates to the "configure"
process, so you'll need to run configure again. Make sure that you alter
your Modules/Setup to build some shared modules, and then try it out.
Here are some of the platforms that I believe need specific testing:
- NetBSD, FreeBSD, OpenBSD, ...
- AIX
- HP/UX
- BeOS
- NeXT
- Mac
- OS/2
- Win16
I believe it should work for most people, but we may be looking for the
wrong "init<module>" symbol on some platforms. We might even be selecting
the wrong import mechanism (or missing it altogether!) on some platforms.
If you get a chance to test this, then please drop me a note with your
platform and whether it succeeded or failed (and how it failed).
Thanx!
-g
p.s. you can tell if dynamic loading is missing by watching for
DYNLOADFILE in the configure process and seeing if it used dynload_stub.
alternatively, you can import the "imp" module and see if "load_dynamic"
is missing.
--
Greg Stein, http://www.lyra.org/
In November there was an interesting discussion on comp.lang.python
about the meaning of __str__ and __repr__. One tidbit that came out
of this discussion was that __str__ for longs should drop the trailing
'L'. Was there a decision on this? I'd really like this to happen.
We do alot of work with RDBMS systems and long integers seem to
come up alot with these systems (as do other fix-decimal number,
but that's another topic ;). For example, our latest Sybase and
Oracle support in Zope returns long integers for RDBMS types
like NUMBER(10,0). The trailing 'L' in the string representation
is causeing us some headaches. This seems also to be an issue when
using the current standard ODBC interface with Oracle, as indicated
in a DB-SIG post today.
Jim
--
Jim Fulton mailto:jim@digicool.com Python Powered!
Technical Director (888) 344-4332 http://www.python.org
Digital Creations http://www.digicool.comhttp://www.zope.org
Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission. Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.
> (1) I think there is great mileage in combining the
> fixed-decimal concept with Martin Fowler's Quantity
> pattern, so that a variable could be defined as not
> just two decimal places but also (say) "GBP" or "USD",
> and it would be an error to add the two. Same applies
> for adding metres, kilograms and other quantities.
> There has also been discussion that the 'type' of a
> quantity should determine what math should apply.
Isn't this something that is ideally suited for implementation in a Python
module, based on a core implementation of fixed decimal numbers?
--
Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen(a)oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm
> >> However, unlike RPG, we should probably ensure
> >> that attempts to overflow or underflow the scale
> >> result in NaN or Overflow conditions, rather
> >> than assuming the user is right and losing
> >> the significant digits.
>
> > Since this would be based on infinite-precision
> numbers, I don't
> > think that this would be an issue.
Three very general observations before I disappear for
Christmas:
(1) I think there is great mileage in combining the
fixed-decimal concept with Martin Fowler's Quantity
pattern, so that a variable could be defined as not
just two decimal places but also (say) "GBP" or "USD",
and it would be an error to add the two. Same applies
for adding metres, kilograms and other quantities.
There has also been discussion that the 'type' of a
quantity should determine what math should apply.
(2) If Python is going to be used increasingly in
eCommerce, it should be good at dealing with money -
maybe not in the core language, but we should aim for
one standard package.
(3) We have a python-finance list
(python-finance(a)egroups.com), recently generalized to
cover business systems, which is a good place to
discuss this if anyone wants to. There are people
there who have time, would love to prototype something
(indeed some work started in this area 3 months back),
and would use it at work too. This would be an ideal
first target for that group - or indeed for a
finance-sig. I'll pursue this in the New Year.
Merry Christmas,
Andy
=====
Andy Robinson
Robinson Analytics Ltd.
------------------
My opinions are the official policy of Robinson Analytics Ltd.
They just vary from day to day.
_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
Guido van Rossum writes:
> +
> + class GetoptError(Exception):
> + opt = ''
> + msg = ''
> + def __init__(self, *args):
> + self.args = args
> + if len(args) == 1:
> + self.msg = args[0]
> + elif len(args) == 2:
> + self.msg = args[0]
> + self.opt = args[1]
> +
> + def __str__(self):
> + return self.msg
>
> ! error = GetoptError # backward compatibility
This breaks as soon as the standard exceptions are strings; does
this mean -X will be removed in the next release? (Please????)
-Fred
--
Fred L. Drake, Jr. <fdrake at acm.org>
Corporation for National Research Initiatives
Sorry, should have replied to the list...
--- Andy Robinson <captainrobbo(a)yahoo.com> wrote:
> Date: Thu, 23 Dec 1999 08:37:18 -0800 (PST)
> From: Andy Robinson <captainrobbo(a)yahoo.com>
> Reply-to: andy(a)robanal.demon.co.uk
> Subject: Re: [Python-Dev] Date and timetypes (was:
> Fixed-decimal types)
> To: Guido van Rossum <guido(a)CNRI.Reston.VA.US>
>
> --- Guido van Rossum <guido(a)CNRI.Reston.VA.US>
> wrote:
> > I don't know much about date/time types, or about
> > mxDateTime.
> > My intuition is that there are too many ways to do
> > it, and that being
> > compatible with commercial databases may not be
> the
> > right way to do it
> > for core Python.
> >
>
> OK. Let me rephrase it. Say we form a consensus on
> 'the right way'. Are you amenable to some solution
> which goes back before 1970 and after 2038 going
> into
> the standard library?
>
> And does your answer change if it involves some
> compiled code as well?
>
> I mention mxDateTime because it was agreed by a
> Python
> SIG, is mature and stable, and I find it very
> useful.
> And the core type is pretty small - much of the
> helper
> stuff in the package now could be kept separate from
> the main Python distribution.
>
> - Andy
>
>
> =====
> Andy Robinson
> Robinson Analytics Ltd.
> ------------------
> My opinions are the official policy of Robinson
> Analytics Ltd.
> They just vary from day to day.
>
>
_________________________________________________________
> Do You Yahoo!?
> Get your free @yahoo.com address at
> http://mail.yahoo.com
>
=====
Andy Robinson
Robinson Analytics Ltd.
------------------
My opinions are the official policy of Robinson Analytics Ltd.
They just vary from day to day.
_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com