Phasing a tabbed file? - Printable Version

Phasing a tabbed file? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Phasing a tabbed file? (/thread-6921.html)

Phasing a tabbed file? - Joseph_f2 - Dec-13-2017

Hi there,

I am having trouble with what I believe is a very simple problem. I have a text file which looks like this:
Cat 1
Sub Cat 1
Sub Cat 2
Sub Cat 3
Testing
Cat 2
Sub Cat 1
Nested item
Another nested item
Nested-nested item
Nested-nested item 2
Yet another nested item
Sub Cat 2
Sub Cat 3

The lines end in \n and four spaces represent a tab (\t - I couldn't add tabs here). I am trying to convert this list of items into a nested array with a similar structure to that of a file system.

Ideally, the result should look something like this:
{"Cat 1": {
   "Sub Cat 1",
   "Sub Cat 2",
   {"Sub Cat 3": {"Testing"}}
},
"Cat 2": {
   {"Sub Cat 1": {"Nested item",
{"Another nested item": {"Nested nested item", "Nested nested item 2"}},
"Yet another nested item"},
   "Sub Cat 2",
   "Sub Cat 3"}
}

Afterwards, I would like to convert the nested array into both JSON and CSV format to be written to the file system. Please feel free to change the format of the result and nested results, so long as they can be outputted correctly.

RE: Phasing a tabbed file? - j.crater - Dec-13-2017

What in particular is the trouble you are having? Post the code of your attempt of tackling the problem in Python code tags, and potential error messages in error code tags, and we will look into it.

RE: Phasing a tabbed file? - ODIS - Dec-14-2017

If you are using dictionary as your "nested array", you'll have a problem with syntax because this is not a valid dictionary:

{
    "value",
    "key": "value"
}

You cannot mix values with and without keys (dictionary without keys is actually a different data type - set).

RE: Phasing a tabbed file? - Joseph_f2 - Dec-14-2017

(Dec-13-2017, 09:51 PM)j.crater Wrote: What in particular is the trouble you are having? Post the code of your attempt of tackling the problem in Python code tags, and potential error messages in error code tags, and we will look into it.

Hi there, here's my current code:

f = open("Categories List.txt", "r")
cList = f.read()
f.close()

cList = cList.split("\n")
array = {}
lastR = ""

for r in cList:    
    if r[0] == "\t":
        array[lastR].append(r[1:])
    else:
        lastR = r
        array[lastR] = []
    
print(array)

As you can see, it works fine with only one level of indentation, however, once there are multiple tabs prefixing a line, it will not correctly insert that into the correct array position. I am just not sure how to store the data in such a way you can dynamically add to any depth of the array.

RE: Phasing a tabbed file? - Joseph_f2 - Dec-14-2017

(Dec-14-2017, 12:44 AM)ODIS Wrote: If you are using dictionary as your "nested array", you'll have a problem with syntax because this is not a valid dictionary:
{
"value",
"key": "value"
}
You cannot mix values with and without keys (dictionary without keys is actually a different data type - set).

So which data types should I use to do this correctly and how should I mix them?

RE: Phasing a tabbed file? - ODIS - Dec-15-2017

Well I would process your categories/items structure with some OOP abstraction. Here is the quick solution:

import json
from typing import List


class Item:
    def __init__(self, name: str):
        self.name = name

    def get_name(self) -> str:
        return self.name


class Category:
    def __init__(self, name: str):
        self.name = name
        self.parent_category = None
        self.subcategories = []
        self.items = []

    def set_parent(self, parent_category: "Category"):
        self.parent_category = parent_category

    def get_parent(self) -> "Category":
        if self.parent_category is None:
            raise Exception("You are trying to get non-existing parent category")
        return self.parent_category

    def add_subcategory(self, subcategory: "Category"):
        self.subcategories.append(subcategory)

    def get_subcategories(self) -> List["Category"]:
        return self.subcategories

    def add_item(self, item: Item):
        self.items.append(item)

    def get_items(self) -> List[Item]:
        return self.items

    def pop_last_item(self) -> Item:
        if len(self.items) == 0:
            raise Exception("Cannot pop last item - empty item list")
        return self.items.pop()

    def to_dict(self) -> dict:
        return {
            "category_name": self.name,
            "items": list(map(lambda item: item.get_name(), self.items)),
            "subcategories": list(map(lambda subcategory: subcategory.to_dict(), self.subcategories))
        }


with open("./input_file.txt") as file:
    # root category is a container of all 1st level categories
    root_category = Category("root")
    current_category = root_category
    current_tabs_count = 0
    # we iterate over all file lines
    for line in file.readlines():
        # we cut the the tabulators from the start of a line
        line_without_leading_tabs = line.lstrip("\t")
        # if the line is empty then, we just skip it
        if line_without_leading_tabs == "":
            continue
        # we count the number of a start tabs
        leading_tabs_count = len(line) - len(line_without_leading_tabs)
        # if there is two tabs jump forward, we quit with syntax error
        if leading_tabs_count - current_tabs_count > 1:
            raise Exception("Syntax error - two tabs forward jump is not allowed")
        # if there is only one tab jump forward, we create new category from the last item
        if leading_tabs_count - current_tabs_count == 1:
            new_category = Category(current_category.pop_last_item().get_name())
            new_category.set_parent(current_category)
            current_category.add_subcategory(new_category)
            current_category = new_category
        # if other cases we stay on one place or going back
        # we move to appropriate parent category
        else:
            for i in range(current_tabs_count - leading_tabs_count):
                current_category = current_category.get_parent()
        # and for all cases, we add new item and change the current tabs count
        current_category.add_item(Item(line_without_leading_tabs.strip()))
        current_tabs_count = leading_tabs_count


# we extract categories from the root category and dumps them to json
result = list(map(lambda category: category.to_dict(), root_category.get_subcategories()))
print(json.dumps(result))

And here is the result in JSON:

Output:[
   {
      "category_name":"Cat 1",
      "items":[
         "Sub Cat 1",
         "Sub Cat 2"
      ],
      "subcategories":[
         {
            "category_name":"Sub Cat 3",
            "items":[
               "Testing"
            ],
            "subcategories":[]
         }
      ]
   },
   {
      "category_name":"Cat 2",
      "items":[
         "Sub Cat 2",
         "Sub Cat 3"
      ],
      "subcategories":[
         {
            "category_name":"Sub Cat 1",
            "items":[
               "Nested item",
               "Yet another nested item"
            ],
            "subcategories":[
               {
                  "category_name":"Another nested item",
                  "items":[
                     "Nested-nested item",
                     "Nested-nested item 2"
                  ],
                  "subcategories":[]
               }
            ]
         }
      ]
   }
]