New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[SOAR-18956] Mimecast V2 - Update hash limits #3167

Merged

ablakley-r7 merged 14 commits into develop from soar-18956_mimecast_v2

Feb 26, 2025

+510 −68

Collaborator

ablakley-r7 commented Feb 21, 2025 •

edited

Loading

Proposed Changes

Description

Describe the proposed changes:

Update hash limits for state to reduce state size
Update log limit to 7500 for all logs
Context: SOAR-18956

PR Requirements

Developers, verify you have completed the following items by checking them off:

Testing

Unit Tests

Review our documentation on generating and writing plugin unit tests

Unit tests written for any new or updated code

In-Product Tests

If you are an InsightConnect customer or have access to an InsightConnect instance, the following in-product tests should be done:

Screenshot of job output with the plugin changes
Screenshot of the changed connection, actions, or triggers input within the InsightConnect workflow builder

Style

Review the style guide

For dependencies, pin OS package and Python package versions
For security, set least privileged account with USER nobody in the Dockerfile when possible
For size, use the slim SDK images when possible: rapid7/insightconnect-python-3-38-slim-plugin:{sdk-version-num} and rapid7/insightconnect-python-3-38-plugin:{sdk-version-num}
For error handling, use of PluginException and ConnectionTestException
For logging, use self.logger
For docs, use changelog style
For docs, validate markdown with insight-plugin validate which calls icon_validate to lint help.md

Functional Checklist

Work fully completed
Functional
- Any new actions/triggers include JSON test files in the tests/ directory created with insight-plugin samples
- Tests should all pass unless it's a negative test. Negative tests have a naming convention of tests/$action_bad.json
- Unsuccessful tests should fail by raising an exception causing the plugin to die and an object should be returned on successful test
- Add functioning test results to PR, sanitize any output if necessary
  - Single action/trigger insight-plugin run -T tests/example.json --debug --jq
  - All actions/triggers shortcut insight-plugin run -T all --debug --jq (use PR format at end)
- Add functioning run results to PR, sanitize any output if necessary
  - Single action/trigger insight-plugin run -R tests/example.json --debug --jq
  - All actions/triggers shortcut insight-plugin run --debug --jq (use PR format at end)

Assessment

You must validate your work to reviewers:

Run insight-plugin validate and make sure everything passes
Run the assessment tool: insight-plugin run -A. For single action validation: insight-plugin run tests/{file}.json -A
Copy (insight-plugin ... | pbcopy) and paste the output in a new post on this PR
Add required screenshots from the In-Product Tests section

ablakley-r7 requested a review from a team as a code owner

February 21, 2025 09:59

joneill-r7 reviewed

View reviewed changes

plugins/mimecast_v2/icon_mimecast_v2/tasks/monitor_siem_logs/task.py Outdated Show resolved Hide resolved

joneill-r7 reviewed

View reviewed changes

plugins/mimecast_v2/icon_mimecast_v2/tasks/monitor_siem_logs/task.py Outdated Show resolved Hide resolved

plugins/mimecast_v2/icon_mimecast_v2/tasks/monitor_siem_logs/task.py Outdated Show resolved Hide resolved

igorski-r7 approved these changes

View reviewed changes

ablakley-r7 requested review from joneill-r7 and igorski-r7

February 21, 2025 11:54

ablakley-r7 force-pushed the soar-18956_mimecast_v2 branch from 082b834 to af0c414 Compare

February 24, 2025 14:15

igorski-r7 approved these changes

View reviewed changes

joneill-r7 reviewed

View reviewed changes

plugins/mimecast_v2/icon_mimecast_v2/tasks/monitor_siem_logs/task.py Outdated Show resolved Hide resolved

plugins/mimecast_v2/icon_mimecast_v2/util/api.py Outdated

    
            @@ -79,15 +109,41 @@ def get_siem_batches(
          
                      urls = [batch.get("url") for batch in batch_list]

                      return urls, batch_response.get("@nextPage"), caught_up

                  def get_siem_logs_from_batch(self, url: str):

                  def resume_from_batch(

Collaborator

joneill-r7 Feb 24, 2025

have we over complicated this ability to resume from the new list of files? we're doing the loop multiple times with multiple comparisons both times. could this not just iterate over list_of_batches if we have a saved_url. Once we then hit the saved_url match, slice from that index->end. Later on when we then feed the pool_data into get_siem_logs_from_batch if the file name is saved_url use that index otherwise it defaults to zero?

Collaborator Author

ablakley-r7 Feb 25, 2025

So have refactored resume_batch and now use partial to spread the saved values into the get_siem_logs_from_batch function instead of maintaining them in a tuple that resume_batch generates.

plugins/mimecast_v2/icon_mimecast_v2/util/api.py Outdated

    
                      return pool_data

                  def get_siem_logs_from_batch(self, url_and_position: Tuple[str, int]) -> Tuple[List[Dict], str]:

                      url, line_start = url_and_position

Collaborator

joneill-r7 Feb 24, 2025

if we simplify the resume_from_batching, this could be along the lines:

def get_siem_logs_from_batch(self, url, starting_url, starting_position):
   starting_position = starting_position if url == starting_url else 1
   <rest of logic can stay the same>

Collaborator Author

ablakley-r7 Feb 25, 2025

So have refactored resume_batch and now use partial to spread the saved values into the get_siem_logs_from_batch function instead of maintaining them in a tuple that resume_batch generates.

plugins/mimecast_v2/icon_mimecast_v2/util/api.py Outdated Show resolved Hide resolved

plugins/mimecast_v2/icon_mimecast_v2/util/api.py Outdated

    
                      return pool_data

                  def get_siem_logs_from_batch(self, url_and_position: Tuple[str, int]) -> Tuple[List[Dict], str]:

                      url, line_start = url_and_position

                      response = requests.request(method=GET, url=url, stream=False)

Collaborator

joneill-r7 Feb 24, 2025

out of interest does streaming=True work for this endpoint? when I was looking into this before when it does it usually means the API returns a content-length which would tell us if the file is going to exceed our content limit

Collaborator Author

ablakley-r7 Feb 25, 2025

It does and we can see our content length in bytes though I'm not sure how immediately useful this is. Thinking we would have to know the compression ratio and average size of each log JSON?

plugins/mimecast_v2/icon_mimecast_v2/util/api.py Outdated Show resolved Hide resolved

plugins/mimecast_v2/icon_mimecast_v2/util/api.py Outdated

    
                          for batch_logs, url in result:

                              if isinstance(batch_logs, (List, Dict)):

                                  with lock:

                                      total_count.value = total_count.value + len(batch_logs)

Collaborator

joneill-r7 Feb 24, 2025

there's quite a few repeated len and calcs here, could we simplify it?

Suggested change

      
                                    total_count.value = total_count.value + len(batch_logs)
          
                                    total_batch_logs = len(batch_logs)
          
                                    total_count.value = total_count.value + total_batch_logs
          
                                    if total_count.value >= log_size_limit:
          
                                            leftover_logs_count = total_count.value - log_size_limit
          
                                            saved_position = (total_batch_logs - leftover_logs_count)
          
                                            batch_logs = batch_logs[0 : saved_position]
          
                                            logs.extend(batch_logs)
          
                                            <contd>

Using this should hopefully help memory a tad as well as we're not making new variables but slicing and dicing the current one

Collaborator Author

ablakley-r7 Feb 25, 2025

Have added that into remove a length calculation. I think the rest is maybe fine, we only really make the necessary calculations to get our counts. The subsection of logs could be done with negative splicing but we then also have to do a min check for batches of logs that return only 1 log.

joneill-r7 reviewed

View reviewed changes

plugins/mimecast_v2/icon_mimecast_v2/util/api.py Outdated

    
                      response = requests.request(method=GET, url=url, stream=False)

                      with gzip.GzipFile(fileobj=BytesIO(response.content), mode="rb") as file_:

                          logs = []

                          # Iterate over lines in the decompressed file, decode and load the JSON

                          for line in file_:

                          for _, line in enumerate(file_, start=line_start):

                              decoded_line = line.decode("utf-8").strip()

                              logs.append(json.loads(decoded_line))

Collaborator

joneill-r7 Feb 24, 2025

i'm just remembering we had issues on v1 that some files could contain malformed JSON and then we got stuck in a loop - should we allow for this in here again and continue to the next file?

ablakley-r7 requested a review from joneill-r7

February 25, 2025 10:46

ablakley-r7 added 10 commits

February 25, 2025 17:23


          Update hash limits

9eaa407


          Remove uneccessary max limit check

a39c447


          Separate log type hashes

879ab5f


          Remove debugging

8167d14


          Add max logs logic for each run

a91325c


          Refactor for prospector

9d2e49c


          Fix unit test | Add custom config log limit | Update file start to 0

78deb53


          Remove unused import

86feb28


          Add unit test

6824fa7


          Refactor resume function | Refactor custom config log limit | Add jso…

2c81ddd

…n decode error handling and unit test

ablakley-r7 force-pushed the soar-18956_mimecast_v2 branch from af0c414 to 2c81ddd Compare

February 25, 2025 17:23

joneill-r7 reviewed

View reviewed changes

plugins/mimecast_v2/icon_mimecast_v2/tasks/monitor_siem_logs/task.py Show resolved Hide resolved

plugins/mimecast_v2/icon_mimecast_v2/util/api.py Show resolved Hide resolved

plugins/mimecast_v2/icon_mimecast_v2/util/api.py Outdated Show resolved Hide resolved

plugins/mimecast_v2/icon_mimecast_v2/util/api.py Outdated

    
                              logs.append(json.loads(decoded_line))

                      return logs

                              try:

                                  logs.append(json.loads(decoded_line))

Collaborator

joneill-r7 Feb 25, 2025

honestly I'm a bit wary that we could opening huge gzip files here and loading the entire content into memory. do we know if there's a limit on the files from mimecast or worth looking at how we used docker stats in the past? Although it does cause issues with catching the exception, and I might be being overcautious

Collaborator Author

ablakley-r7 Feb 26, 2025

Updated to now use chunking which will keep us from storing the response.

plugins/mimecast_v2/icon_mimecast_v2/util/api.py Outdated Show resolved Hide resolved

ablakley-r7 added 3 commits

February 26, 2025 09:00


          Update to stream content in chunks | Update unit tests | Add addition…

a8f7736

…al logging


          Update testing value

22fd8d6


          Update testing value

fc268ab

ablakley-r7 requested a review from joneill-r7

February 26, 2025 09:07

ekelly-r7 reviewed

View reviewed changes

plugins/mimecast_v2/icon_mimecast_v2/util/api.py Show resolved Hide resolved

ekelly-r7 reviewed

View reviewed changes

plugins/mimecast_v2/icon_mimecast_v2/util/api.py Outdated Show resolved Hide resolved

ekelly-r7 reviewed

View reviewed changes

plugins/mimecast_v2/icon_mimecast_v2/util/api.py Show resolved Hide resolved


          Update error handling | Add comment

a1807bb

ablakley-r7 requested a review from ekelly-r7

February 26, 2025 10:41

ekelly-r7 approved these changes

View reviewed changes

ablakley-r7 merged commit 6bb44ac into develop

12 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet