Python Forum
I need help with a "deallocating none" error
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
I need help with a "deallocating none" error
#1
I've been trying to track down an issue with some Python code compiled from C (using swig). I've spent days trying to debug this in the python debugger and gdb, and I have hit a wall Wall Any sort of hint as to what else to look at would be greatly appreciated.

The following information was retrieved by executing gdb using python3-dbg (python version 3.8).

Python crashes with this error:
Fatal Python error: deallocating None
Python runtime state: initialized

The offending line of python code on the call stack is:
value = self.get(name)
(The value of name should be a valid string)

The first call to none_dealloc in the stack trace is here:
#26 0x000000000046ef6e in none_dealloc (ignore=<optimized out>) at ../Objects/object.c:1585
#27 0x00000000004706da in _Py_Dealloc (op=<optimized out>) at ../Objects/object.c:2215
#28 0x00000000004470e8 in _Py_DECREF (op=<optimized out>, lineno=430, filename=0x699244 "../Objects/frameobject.c") at ../Include/object.h:478
#29 frame_dealloc (f=0x25938d0) at ../Objects/frameobject.c:430
#30 0x00000000004706da in _Py_Dealloc (op=op@entry=0x25938d0) at ../Objects/object.c:2215
#31 0x00000000004e02e6 in _Py_DECREF (op=0x25938d0, lineno=4314, filename=0x6e3343 "../Python/ceval.c") at ../Include/object.h:478
#32 _PyEval_EvalCodeWithName (_co=0x7fdc678c0520, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=2, kwnames=0x0, kwargs=0x7fdc6468cf90, kwcount=<optimized out>, kwstep=1, defs=0x7fdc678bf6f8, defcount=1, kwdefs=0x0, closure=0x0, name=0x7fdc67d0c270, qualname=0x7fdc678c2280) at ../Python/ceval.c:4314

This is confusing to me since since the Python code initiating the dealloc at frameobject.c 430 in Python 3.8 is:
    /* Kill all local variables */
    valuestack = f->f_valuestack;
    for (p = f->f_localsplus; p < valuestack; p++)
        Py_CLEAR(*p);
The definition of Py_CLEAR checks for NULL before trying to deallocate the pointer. NULL is the same as "None", right? When I look at the value of op or _py_tmp in the call stack, it reads as "optimized out".
#define Py_CLEAR(op)                            \
    do {                                        \
        PyObject *_py_tmp = _PyObject_CAST(op); \
        if (_py_tmp != NULL) {                  \
            (op) = NULL;                        \
            Py_DECREF(_py_tmp);                 \
        }                                       \
    } while (0)
I get the feeling though that something in my code is corrupting a pointer within Python scope. This error is pretty repeatable - it usually happens while processing the same set of data.

The last thing that I feel like I could try is compiling a flavor of python-dbg without optimization so I can look further into some of the things that gdb says are "optimized out". Is there anything else I can try?
Reply
#2
Of course None is an object which doesn't map to NULL in C, so that clears up some of my confusion there. The question is, what could I have done which would make something None in the Python local variables there?
Reply
#3
Sometimes it helps me to type things out so it gets me to think more critically. That's rubber duck debugging for you. Sorry for all the chatter just from me.

It occurred to me that this is most likely caused by something calling py_decref 1 too many times, causing the underlying object to get set to None prematurely. Does that sound right? edit: Nope! It was the opposite of that.

While I still haven't exactly found my issue, this has given me something else to think about. My C-Python library doesn't directly call py_decref, but the swig layer does. I have my doubts that there is a fundamental issue in swig as I'm sure it has been tested pretty well by the community. That has gotten me thinking about other libraries which may be the problem.

My next step is to find the name of the variable that is being deallocated and context in which it exists. The only way I know to do that is to compile Python without optimization so I can dig into those details. That is unless maybe I'm not looking deep enough into the gdb call stack. I'm not so familiar with how the Python stack is organized under the hood, so I've been just poking around for strings.
Reply
#4
I may have found a bug in swig-generated code!? Py_None is being returned here without calling Py_INCREF. Fixing that still doesn't correct my issue, but it seems like something like this would be the culprit (if I'm understanding these dynamics correctly)

edit:Never mind about it being a swig issue; this was my boo boo. Swig was just following orders! This was indeed what was causing my exception. In my excitement after finding this issue, I forgot to install the recompiled binary before running the test. It runs fine now!

SWIGINTERN PyObject *_wrap_SomeClass_get(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
// ... 
result = (arg1)->get((std::string const &)*arg2);
  {
    if ((&result)->size() > 0) {
      resultobj = PyBytes_FromStringAndSize((&result)->data(),(&result)->size());
    } else {
      resultobj = Py_None;
    }
  }
  if (SWIG_IsNewObj(res2)) delete arg2;
  return resultobj;
}
The fix:
SWIGINTERN PyObject *_wrap_SomeClass_get(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
// ... 
result = (arg1)->get((std::string const &)*arg2);
  {
    if ((&result)->size() > 0) {
      resultobj = PyBytes_FromStringAndSize((&result)->data(),(&result)->size());
    } else {
      resultobj = Py_None;
      Py_INCREF(resultobj);
    }
  }
  if (SWIG_IsNewObj(res2)) delete arg2;
  return resultobj;
}
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020