![]() |
Phasing a tabbed file? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Phasing a tabbed file? (/thread-6921.html) |
Phasing a tabbed file? - Joseph_f2 - Dec-13-2017 Hi there, I am having trouble with what I believe is a very simple problem. I have a text file which looks like this: Cat 1 Sub Cat 1 Sub Cat 2 Sub Cat 3 Testing Cat 2 Sub Cat 1 Nested item Another nested item Nested-nested item Nested-nested item 2 Yet another nested item Sub Cat 2 Sub Cat 3 The lines end in \n and four spaces represent a tab (\t - I couldn't add tabs here). I am trying to convert this list of items into a nested array with a similar structure to that of a file system. Ideally, the result should look something like this: {"Cat 1": { "Sub Cat 1", "Sub Cat 2", {"Sub Cat 3": {"Testing"}} }, "Cat 2": { {"Sub Cat 1": {"Nested item", {"Another nested item": {"Nested nested item", "Nested nested item 2"}}, "Yet another nested item"}, "Sub Cat 2", "Sub Cat 3"} } Afterwards, I would like to convert the nested array into both JSON and CSV format to be written to the file system. Please feel free to change the format of the result and nested results, so long as they can be outputted correctly. RE: Phasing a tabbed file? - j.crater - Dec-13-2017 What in particular is the trouble you are having? Post the code of your attempt of tackling the problem in Python code tags, and potential error messages in error code tags, and we will look into it. RE: Phasing a tabbed file? - ODIS - Dec-14-2017 If you are using dictionary as your "nested array", you'll have a problem with syntax because this is not a valid dictionary: { "value", "key": "value" }You cannot mix values with and without keys (dictionary without keys is actually a different data type - set). RE: Phasing a tabbed file? - Joseph_f2 - Dec-14-2017 (Dec-13-2017, 09:51 PM)j.crater Wrote: What in particular is the trouble you are having? Post the code of your attempt of tackling the problem in Python code tags, and potential error messages in error code tags, and we will look into it. Hi there, here's my current code: f = open("Categories List.txt", "r") cList = f.read() f.close() cList = cList.split("\n") array = {} lastR = "" for r in cList: if r[0] == "\t": array[lastR].append(r[1:]) else: lastR = r array[lastR] = [] print(array)As you can see, it works fine with only one level of indentation, however, once there are multiple tabs prefixing a line, it will not correctly insert that into the correct array position. I am just not sure how to store the data in such a way you can dynamically add to any depth of the array. RE: Phasing a tabbed file? - Joseph_f2 - Dec-14-2017 (Dec-14-2017, 12:44 AM)ODIS Wrote: If you are using dictionary as your "nested array", you'll have a problem with syntax because this is not a valid dictionary: So which data types should I use to do this correctly and how should I mix them? RE: Phasing a tabbed file? - ODIS - Dec-15-2017 Well I would process your categories/items structure with some OOP abstraction. Here is the quick solution: import json from typing import List class Item: def __init__(self, name: str): self.name = name def get_name(self) -> str: return self.name class Category: def __init__(self, name: str): self.name = name self.parent_category = None self.subcategories = [] self.items = [] def set_parent(self, parent_category: "Category"): self.parent_category = parent_category def get_parent(self) -> "Category": if self.parent_category is None: raise Exception("You are trying to get non-existing parent category") return self.parent_category def add_subcategory(self, subcategory: "Category"): self.subcategories.append(subcategory) def get_subcategories(self) -> List["Category"]: return self.subcategories def add_item(self, item: Item): self.items.append(item) def get_items(self) -> List[Item]: return self.items def pop_last_item(self) -> Item: if len(self.items) == 0: raise Exception("Cannot pop last item - empty item list") return self.items.pop() def to_dict(self) -> dict: return { "category_name": self.name, "items": list(map(lambda item: item.get_name(), self.items)), "subcategories": list(map(lambda subcategory: subcategory.to_dict(), self.subcategories)) } with open("./input_file.txt") as file: # root category is a container of all 1st level categories root_category = Category("root") current_category = root_category current_tabs_count = 0 # we iterate over all file lines for line in file.readlines(): # we cut the the tabulators from the start of a line line_without_leading_tabs = line.lstrip("\t") # if the line is empty then, we just skip it if line_without_leading_tabs == "": continue # we count the number of a start tabs leading_tabs_count = len(line) - len(line_without_leading_tabs) # if there is two tabs jump forward, we quit with syntax error if leading_tabs_count - current_tabs_count > 1: raise Exception("Syntax error - two tabs forward jump is not allowed") # if there is only one tab jump forward, we create new category from the last item if leading_tabs_count - current_tabs_count == 1: new_category = Category(current_category.pop_last_item().get_name()) new_category.set_parent(current_category) current_category.add_subcategory(new_category) current_category = new_category # if other cases we stay on one place or going back # we move to appropriate parent category else: for i in range(current_tabs_count - leading_tabs_count): current_category = current_category.get_parent() # and for all cases, we add new item and change the current tabs count current_category.add_item(Item(line_without_leading_tabs.strip())) current_tabs_count = leading_tabs_count # we extract categories from the root category and dumps them to json result = list(map(lambda category: category.to_dict(), root_category.get_subcategories())) print(json.dumps(result))And here is the result in JSON:
|